Schema for SNPs (131) - Simple Nucleotide Polymorphisms (dbSNP build 131)
Database: hg19    Primary Table: snp131    Row Count: 26,033,053
Format description: Polymorphism data from dbSnp database or genotyping arrays
fieldexampleSQL type description
bin 585smallint(5) unsigned Indexing field to speed chromosome range queries.
chrom chr1varchar(31) Reference sequence chromosome or scaffold
chromStart 10433int(10) unsigned Start position in chrom
chromEnd 10433int(10) unsigned End position in chrom
name rs56289060varchar(15) dbSNP Reference SNP identifier
score 0smallint(5) unsigned Not used
strand +enum('+','-') Which DNA strand contains the observed alleles
refNCBI -blob Reference genomic sequence from dbSNP
refUCSC -blob Reference genomic sequence from UCSC lookup of chrom,chromStart,chromEnd
observed -/Cvarchar(255) The sequences of the observed alleles from rs-fasta files
molType genomicenum('unknown','genomic','cDNA') Sample type from exemplar submitted sequence (ss)
class insertionenum('unknown','single','in-del','het','microsatellite','named','mixed','mnp','insertion','deletion') Class of variant (single, in-del, named, mixed, etc.)
valid unknownset('unknown','by-cluster','by-frequency','by-submitter','by-2hit-2allele','by-hapmap','by-1000genomes') Validation status of the SNP
avHet 0float Average heterozygosity from all observations
avHetSE 0float Standard Error for the average heterozygosity
func near-gene-5set('unknown','coding-synon','intron','coding-synonymy-unknown','near-gene-3','near-gene-5','nonsense','missense','frameshift','cds-indel','untranslated-3','untranslated-5','splice-3','splice-5') Functional category of the SNP (coding-synon, coding-nonsynon, intron, etc.)
locType betweenenum('range','exact','between','rangeInsertion','rangeSubstitution','rangeDeletion') Type of mapping inferred from size on reference; may not agree with class
weight 1int(10) unsigned The quality of the alignment: 1 = unique mapping, 2 = non-unique, 3 = many matches
Connected Tables and Joining Fields
      hg19.gwasCatalog.name (via snp131.name)
      hg19.snp131CodingDbSnp.name (via snp131.name)
      hg19.snp131Exceptions.name (via snp131.name)
      hg19.snp131OrthoPt2Pa2Rm2.name (via snp131.name)
      hg19.snp131Seq.acc (via snp131.name)
Sample Rows
binchromchromStartchromEndnamescorestrandrefNCBIrefUCSCobservedmolTypeclassvalidavHetavHetSEfunclocTypeweight
585chr11043310433rs562890600+---/Cgenomicinsertionunknown00near-gene-5between1
585chr11049110492rs559989310+CCC/Tgenomicsingleunknown00near-gene-5exact1
585chr11051810519rs626365080+GGC/Ggenomicsingleunknown00near-gene-5exact1
585chr11058210583rs581081400+GGA/Ggenomicsingleunknown00near-gene-5exact1
585chr11082710828rs102184920+GGA/Ggenomicsingleby-cluster00near-gene-5exact1
585chr11090310904rs102184930+GGA/Ggenomicsingleby-cluster00near-gene-5exact1
585chr11092610927rs102185270+AAA/Ggenomicsingleby-cluster00near-gene-5exact1
585chr11093710938rs288539870+GGA/Ggenomicsingleunknown00near-gene-5exact1
585chr11100111002rs795370940+AAA/Cgenomicsingleunknown00near-gene-5exact3
585chr11101311014rs284847120+GGA/Ggenomicsingleunknown00near-gene-5exact1

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

SNPs (131) (snp131) Track Description

Description

This track contains information about single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 131, available from ftp.ncbi.nih.gov/snp.

Interpreting and Configuring the Graphical Display

Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases.

The configuration categories reflect the following definitions (not all categories apply to this assembly):

You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, but can sometimes give a bit more detail (e.g. more detail about how close a near-gene SNP is to a nearby gene).

Insertions/Deletions

dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'.

UCSC Annotations

UCSC checks for several unusual conditions that may indicate a problem with the mapping, and reports them in the Annotations section if found:

Another condition, which does not necessarily imply any problem, is noted:

UCSC Re-alignment of flanking sequences

dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition.

Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period <= 12) is shown in lower case, and matching bases are indicated by a "+".

Data Sources

The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/database/ (e.g. for Human, organism_tax_id = human_9606). The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/rs_fasta/

Orthologous Alleles (human assemblies only)

Beginning with the March 2006 human assembly, we provide a related table that contains orthologous alleles in the chimpanzee and rhesus macaque assemblies. Beginning with dbSNP build 129, the orangutan assembly is also included. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria:

In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero).

Masked FASTA Files (human assemblies only)

FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download here. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exlcude problematic SNPs.

References

Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11.