| field | example | SQL type | description |
|---|---|---|---|
| bin | 585 | smallint(5) unsigned | Indexing field to speed chromosome range queries. |
| chrom | chr1 | varchar(31) | Reference sequence chromosome or scaffold |
| chromStart | 10433 | int(10) unsigned | Start position in chrom |
| chromEnd | 10433 | int(10) unsigned | End position in chrom |
| name | rs56289060 | varchar(15) | dbSNP Reference SNP identifier |
| score | 0 | smallint(5) unsigned | Not used |
| strand | + | enum('+','-') | Which DNA strand contains the observed alleles |
| refNCBI | - | blob | Reference genomic sequence from dbSNP |
| refUCSC | - | blob | Reference genomic sequence from UCSC lookup of chrom,chromStart,chromEnd |
| observed | -/C | varchar(255) | The sequences of the observed alleles from rs-fasta files |
| molType | genomic | enum('unknown','genomic','cDNA') | Sample type from exemplar submitted sequence (ss) |
| class | insertion | enum('unknown','single','in-del','het','microsatellite','named','mixed','mnp','insertion','deletion') | Class of variant (single, in-del, named, mixed, etc.) |
| valid | unknown | set('unknown','by-cluster','by-frequency','by-submitter','by-2hit-2allele','by-hapmap','by-1000genomes') | Validation status of the SNP |
| avHet | 0 | float | Average heterozygosity from all observations |
| avHetSE | 0 | float | Standard Error for the average heterozygosity |
| func | near-gene-5 | set('unknown','coding-synon','intron','coding-synonymy-unknown','near-gene-3','near-gene-5','nonsense','missense','frameshift','cds-indel','untranslated-3','untranslated-5','splice-3','splice-5') | Functional category of the SNP (coding-synon, coding-nonsynon, intron, etc.) |
| locType | between | enum('range','exact','between','rangeInsertion','rangeSubstitution','rangeDeletion') | Type of mapping inferred from size on reference; may not agree with class |
| weight | 1 | int(10) unsigned | The quality of the alignment: 1 = unique mapping, 2 = non-unique, 3 = many matches |
| bin | chrom | chromStart | chromEnd | name | score | strand | refNCBI | refUCSC | observed | molType | class | valid | avHet | avHetSE | func | locType | weight |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 585 | chr1 | 10433 | 10433 | rs56289060 | 0 | + | - | - | -/C | genomic | insertion | unknown | 0 | 0 | near-gene-5 | between | 1 |
| 585 | chr1 | 10491 | 10492 | rs55998931 | 0 | + | C | C | C/T | genomic | single | unknown | 0 | 0 | near-gene-5 | exact | 1 |
| 585 | chr1 | 10518 | 10519 | rs62636508 | 0 | + | G | G | C/G | genomic | single | unknown | 0 | 0 | near-gene-5 | exact | 1 |
| 585 | chr1 | 10582 | 10583 | rs58108140 | 0 | + | G | G | A/G | genomic | single | unknown | 0 | 0 | near-gene-5 | exact | 1 |
| 585 | chr1 | 10827 | 10828 | rs10218492 | 0 | + | G | G | A/G | genomic | single | by-cluster | 0 | 0 | near-gene-5 | exact | 1 |
| 585 | chr1 | 10903 | 10904 | rs10218493 | 0 | + | G | G | A/G | genomic | single | by-cluster | 0 | 0 | near-gene-5 | exact | 1 |
| 585 | chr1 | 10926 | 10927 | rs10218527 | 0 | + | A | A | A/G | genomic | single | by-cluster | 0 | 0 | near-gene-5 | exact | 1 |
| 585 | chr1 | 10937 | 10938 | rs28853987 | 0 | + | G | G | A/G | genomic | single | unknown | 0 | 0 | near-gene-5 | exact | 1 |
| 585 | chr1 | 11001 | 11002 | rs79537094 | 0 | + | A | A | A/C | genomic | single | unknown | 0 | 0 | near-gene-5 | exact | 3 |
| 585 | chr1 | 11013 | 11014 | rs28484712 | 0 | + | G | G | A/G | genomic | single | unknown | 0 | 0 | near-gene-5 | exact | 1 |
Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.
This track contains information about single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 131, available from ftp.ncbi.nih.gov/snp.
Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases.
The configuration categories reflect the following definitions (not all categories apply to this assembly):
You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, but can sometimes give a bit more detail (e.g. more detail about how close a near-gene SNP is to a nearby gene).
dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'.
UCSC checks for several unusual conditions that may indicate a problem with the mapping, and reports them in the Annotations section if found:
dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition.
Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period <= 12) is shown in lower case, and matching bases are indicated by a "+".
The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/database/ (e.g. for Human, organism_tax_id = human_9606). The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/rs_fasta/
Beginning with the March 2006 human assembly, we provide a related table that contains orthologous alleles in the chimpanzee and rhesus macaque assemblies. Beginning with dbSNP build 129, the orangutan assembly is also included. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria:
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11.