A significant percentage of the Arabidopsis genome lies outside of gene regions and an even greater portion is not translated to protein. A portion of this non-genic sequence is comprised of repeat elements and some of these repeats can be classified as transposable or transposable-like elements. Transposable genetic elements are commonplace and often exhibit easily recognized features that can aid in their identification and annotation. The following drawings display these features for some common elements of the Arabidopsis genome.
| First, the transposable element (Tn) recognizes the target site. An autonomous Tn encodes the enzymes for cleaving the DNA and integrating into the genome; non-autonomous Tns do not encode these enzymes, but can be acted upon by them. Second, the genomic DNA is cleaved and one end of the Tn joins. Third, the other Tn end joins and the genomic DNA is repaired resulting in a duplication of the target site (TDS). |
![]() |

Identification: TBLASTX similarity searches of intergenic regions frequently produce significant matches to intergenic regions of annotated genomic sequence. The match can be to either a previously identified transposon or, more likely, to a putative transposon that has been identified by genomic sequencing and has not been fully characterized. A pairwise or multiple sequence alignment identifies the region of high similarity (typically 85-95% identical) and the point where that similarity drops. It is in this region that the inverted repeats and tandem duplication sites can be found, if they exist. The sequence that comprises the inverted repeat is conserved in each of the aligned sequences while the target duplication sites are different. Short inverted repeats range in size from 6-15 bp with a single mismatch allowed. Larger IRs, as seen in Mu and Mu-type elements, are easily seen by a dotplot analysis. Until we adopt an existing algorithm to thoroughly search for this particular end architecture, the IRs and TDSs are identified by dotplot analysis, pairwise alignment, and manual inspection.

Identification: These repeats are identified in a manner similar to the IR transposons (described above). Because LINEs are of a variable length, it is often difficult to identify the TDS. At times, the TDS of the shorter copy can be described. The remnant poly(A) tail undergoes change and is usually A-rich and not exclusively poly(A). Solo LTRs contain a 3' tract similar to tRNAs, a 20-30 bp polypyrimidine tract, 2 or 3 bp TDSs, and are 200-1000 bp in length. Refer to the paper by Shepherd, et al. (1984) for a detailed description of a solo LTR.
Please contact me directly (by email, at parnell@cshl.org) if you would like to discuss the annotation of transposons and other genomic repeats.
Larry Parnell