Schema for RepeatMasker - Repeating Elements by RepeatMasker
Database: hg19    Primary Table: rmsk    Row Count: 5,298,130
Format description: RepeatMasker .out record
fieldexampleSQL type description
bin 585smallint(5) unsigned Indexing field to speed chromosome range queries.
swScore 1504int(10) unsigned Smith Waterman alignment score
milliDiv 13int(10) unsigned Base mismatches in parts per thousand
milliDel 4int(10) unsigned Bases deleted in parts per thousand
milliIns 13int(10) unsigned Bases inserted in parts per thousand
genoName chr1varchar(255) Genomic sequence name
genoStart 10000int(10) unsigned Start in genomic sequence
genoEnd 10468int(10) unsigned End in genomic sequence
genoLeft -249240153int(11) -#bases after match in genomic sequence
strand +char(1) Relative orientation + or -
repName (CCCTAA)nvarchar(255) Name of repeat
repClass Simple_repeatvarchar(255) Class of repeat
repFamily Simple_repeatvarchar(255) Family of repeat
repStart 1int(11) Start (if strand is +) or -#bases after match (if strand is -) in repeat sequence
repEnd 463int(11) End in repeat sequence
repLeft 0int(11) -#bases after match (if strand is +) or start (if strand is -) in repeat sequence
id 1char(1) First digit of id field in RepeatMasker .out file. Best ignored.
Sample Rows
binswScoremilliDivmilliDelmilliInsgenoNamegenoStartgenoEndgenoLeftstrandrepNamerepClassrepFamilyrepStartrepEndrepLeftid
585150413413chr11000010468-249240153+(CCCTAA)nSimple_repeatSimple_repeat146301
585361211427013chr11046811447-249239174-TAR1Satellitetelo-39917124832
58543723518635chr11150311675-249238946-L1MCLINEL1-2236564654493
5852392941910chr11167711780-249238841-MER5BDNAhAT-Charlie-7410414
585318230380chr11526415355-249235266-MIR3SINEMIR-119143495
58520316200chr11671216749-249233872+(TGG)nSimple_repeatSimple_repeat13706
5852393381480chr11890619048-249231573+L2aLINEL229423104-3227
5856523468542chr11994720405-249230216+L3LINECR130423519-9708
585270331727chr12053020679-249229942+Plat_L3LINECR128022947-6399
5852542794739chr12194822075-249228546+MLT1KLTRERVL-MaLR15142-4531

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

RepeatMasker (rmsk) Track Description

Description

This track was created by using Arian Smit's RepeatMasker program, which screens DNA sequences for interspersed repeats and low complexity DNA sequences. The program outputs a detailed annotation of the repeats that are present in the query sequence (represented by this track), as well as a modified version of the query sequence in which all the annotated repeats have been masked (generally available on the Downloads page). RepeatMasker uses the RepBase library of repeats from the Genetic Information Research Institute (GIRI). RepBase is described in Jurka, J. (2000) in the References section below.

Display Conventions and Configuration

In full display mode, this track displays up to ten different classes of repeats:

The level of color shading in the graphical display reflects the amount of base mismatch, base deletion, and base insertion associated with a repeat element. The higher the combined number of these, the lighter the shading.

Methods

UCSC has used the most current versions of the RepeatMasker software and repeat libraries available to generate these data. Note that these versions may be newer than those that are publicly available on the Internet.

Data are generated using the RepeatMasker -s flag. Additional flags may be used for certain organisms. Repeats are soft-masked. Alignments may extend through repeats, but are not permitted to initiate in them. See the FAQ for more information.

Credits

Thanks to Arian Smit and GIRI for providing the tools and repeat libraries used to generate this track.

References

Smit, AFA, Hubley, R and Green, P. RepeatMasker Open-3.0. http://www.repeatmasker.org. 1996-2007.

RepBase is described in Jurka J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420.

For a discussion of repeats in mammalian genomes, see:

Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6): 657-63.

Smit AF. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 1996 Dec;6(6):743-8.