Schema for CpG Islands - CpG Islands (Islands < 300 Bases are Light Green)
Database: hg19    Primary Table: cpgIslandExt    Row Count: 28,691
Format description: Describes the CpG Islands (includes observed/expected ratio)
fieldexampleSQL type info description
bin 585smallint(6) range Indexing field to speed chromosome range queries.
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
chromStart 28735int(10) unsigned range Start position in chromosome
chromEnd 29810int(10) unsigned range End position in chromosome
name CpG: 116varchar(255) values CpG Island
length 1075int(10) unsigned range Island Length
cpgNum 116int(10) unsigned range Number of CpGs in island
gcNum 787int(10) unsigned range Number of C and G in island
perCpg 21.6float range Percentage of island that is CpG
perGc 73.2float range Percentage of island that is C or G
obsExp 0.83float range Ratio of observed(cpgNum) to expected(numC*numG/length) CpG in island
Sample Rows
binchromchromStartchromEndnamelengthcpgNumgcNumperCpgperGcobsExp
585chr12873529810CpG: 116107511678721.673.20.83
586chr1135124135563CpG: 304393029513.767.20.64
587chr1327790328229CpG: 294392929513.267.20.62
588chr1437151438164CpG: 8410138473416.672.50.64
588chr1449273450544CpG: 9912719977715.661.10.84
589chr1533219534114CpG: 94895945702163.71.04
589chr1544738546649CpG: 1711911171140517.973.50.67
590chr1713984714547CpG: 605636038521.368.40.92
590chr1762416763445CpG: 115102911567322.465.41.07
591chr1788863789211CpG: 283482819216.155.21.06

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

CpG Islands (cpgIslandExt) Track Description

Description

CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites, and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole.

Methods

CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria:

The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula cited in Gardiner-Garden et al. (1987) in the References section below:


    Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)
where N = length of sequence.

Credits

This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished).

References

Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J. Mol. Biol. 1987 Jul 20;196(2):261-82.