Schema for N-SCAN - N-SCAN Gene Predictions
Database: hg19    Primary Table: nscanGene    Row Count: 67,448
Format description: A gene prediction with some additional info.
fieldexampleSQL type info description
bin 585smallint(5) unsigned range Indexing field to speed chromosome range queries.
name chr1.1.001.avarchar(255) values Name of gene (usually transcript_id from GTF)
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
strand -char(1) values + or - for strand
txStart 14695int(10) unsigned range Transcription start position
txEnd 29617int(10) unsigned range Transcription end position
cdsStart 14695int(10) unsigned range Coding region start
cdsEnd 15844int(10) unsigned range Coding region end
exonCount 5int(10) unsigned range Number of exons
exonStarts 14695,14969,15795,16853,29264,longblob   Exon start positions
exonEnds 14829,15038,15947,17055,29617,longblob   Exon end positions
score 0int(11) range  
name2 chr1.1.001varchar(255) values Alternate name (e.g. gene_id from GTF)
cdsStartStat cmplenum('none','unk','incmpl','cmpl') values enum('none','unk','incmpl','cmpl')
cdsEndStat cmplenum('none','unk','incmpl','cmpl') values enum('none','unk','incmpl','cmpl')
exonFrames 1,1,0,-1,-1,longblob   Exon frame {0,1,2}, or -1 if no frame for exon
Connected Tables and Joining Fields
      hg19.nscanPep.name (via nscanGene.name)
Sample Rows
binnamechromstrandtxStarttxEndcdsStartcdsEndexonCountexonStartsexonEndsscorename2cdsStartStatcdsEndStatexonFrames
585chr1.1.001.achr1-14695296171469515844514695,14969,15795,16853,29264,14829,15038,15947,17055,29617,0chr1.1.001cmplcmpl1,1,0,-1,-1,
585chr1.1.002.achr1+65886701086909070008265886,69036,65971,70108,0chr1.1.002cmplcmpl-1,0,
586chr1.1.003.achr1-1396261739451396261429577139626,142946,155766,158592,165883,168099,173752,139696,143011,155831,158674,165942,168165,173945,0chr1.1.003cmplcmpl2,0,-1,-1,-1,-1,-1,
586chr1.1.004.achr1-2282912342222282912286542228291,234187,228776,234222,0chr1.1.004cmplcmpl0,-1,
587chr1.1.005.achr1+3342593686343676583685973334259,342391,367620,334297,342603,368634,0chr1.1.005cmplcmpl-1,-1,0,
73chr1.1.006.achr1+4377695678045296615678046437769,439248,465211,470906,529614,567658,438152,439568,465339,471273,529725,567804,0chr1.1.006cmplcmpl-1,-1,-1,-1,0,1,
588chr1.pasa.1.achr1-4538284604624539774541662453828,460379,454166,460462,0chr1.pasa.1cmplcmpl0,-1,
589chr1.pasa.2.achr1+5648125648815648185648511564812,564881,0chr1.pasa.2cmplcmpl0,
73chr1.1.007.achr1-6210956554496210956220343621095,639064,655411,622072,639169,655449,0chr1.1.007cmplcmpl0,-1,-1,
590chr1.1.008.achr1-6611547139626611546914679661154,671766,691456,694346,701708,703927,708355,709550,713938,661191,671910,691521,694503,701767,703993,708487,709660,713962,0chr1.1.008cmplcmpl2,2,0,-1,-1,-1,-1,-1,-1,

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

N-SCAN (nscanGene) Track Description

Description

This track shows gene predictions using the N-SCAN gene structure prediction software provided by the Computational Genomics Lab at Washington University in St. Louis, MO, USA.

Methods

N-SCAN combines biological-signal modeling in the target genome sequence along with information from a multiple-genome alignment to generate de novo gene predictions. It extends the TWINSCAN target-informant genome pair to allow for an arbitrary number of informant sequences as well as richer models of sequence evolution. N-SCAN models the phylogenetic relationships between the aligned genome sequences, context-dependent substitution rates, insertions, and deletions.

Human N-SCAN Human N-SCAN uses mouse (mm9) as the informant and iterative pseudogene masking.

Credits

Thanks to Michael Brent's Computational Genomics Group at Washington University St. Louis for providing this data.

Special thanks for this implementation of N-SCAN to Aaron Tenney in the Brent lab, and Robert Zimmermann, currently at Max F. Perutz Laboratories in Vienna, Austria.

References

Gross SS, Brent MR. Using multiple alignments to improve gene prediction. In Proc. 9th Int'l Conf. on Research in Computational Molecular Biology (RECOMB '05):374-388 and J Comput Biol. 2006 Mar;13(2):379-93.

Korf I, Flicek P, Duan D, Brent MR. Integrating genomic homology into gene structure prediction. Bioinformatics. 2001 Jun 1;17(90001):S140-8.

van Baren MJ, Brent MR. Iterative gene prediction and pseudogene removal improves genome annotation. Genome Res. 2006 May;16(5):678-85.

Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 2003 Oct 1;31(19):5654-66.