Schema for Geneid Genes - Geneid Gene Predictions
Database: hg19    Primary Table: geneid    Row Count: 33,428
Format description: A gene prediction with some additional info.
fieldexampleSQL type info description
bin 585smallint(5) unsigned range Indexing field to speed chromosome range queries.
name chr1_1.1varchar(255) values Name of gene (usually transcript_id from GTF)
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
strand -char(1) values + or - for strand
txStart 14695int(10) unsigned range Transcription start position
txEnd 35736int(10) unsigned range Transcription end position
cdsStart 14695int(10) unsigned range Coding region start
cdsEnd 35736int(10) unsigned range Coding region end
exonCount 10int(10) unsigned range Number of exons
exonStarts 14695,14969,15795,16853,172...longblob   Exon start positions
exonEnds 14829,15038,15891,17055,172...longblob   Exon end positions
score 0int(11) range  
name2 chr1_1varchar(255) values Alternate name (e.g. gene_id from GTF)
cdsStartStat cmplenum('none','unk','incmpl','cmpl') values enum('none','unk','incmpl','cmpl')
cdsEndStat cmplenum('none','unk','incmpl','cmpl') values enum('none','unk','incmpl','cmpl')
exonFrames 1,1,1,0,2,0,0,2,1,0,longblob   Exon frame {0,1,2}, or -1 if no frame for exon
Sample Rows
binnamechromstrandtxStarttxEndcdsStartcdsEndexonCountexonStartsexonEndsscorename2cdsStartStatcdsEndStatexonFrames
585chr1_1.1chr1-146953573614695357361014695,14969,15795,16853,17232,17605,17914,18267,24737,35720,14829,15038,15891,17055,17257,17742,18061,18379,24891,35736,0chr1_1cmplcmpl1,1,1,0,2,0,0,2,1,0,
586chr1_2.1chr1-2282912286542282912286541228291,228654,0chr1_2cmplcmpl0,
586chr1_3.1chr1+2342052345592342052345591234205,234559,0chr1_3cmplcmpl0,
587chr1_4.1chr1-3216893316633216893316632321689,331504,321818,331663,0chr1_4cmplcmpl0,0,
587chr1_5.1chr1+3341553685973341553685972334155,367620,334297,368597,0chr1_5cmplcmpl0,1,
589chr1_6.1chr1+5586535697565586535697563558653,567505,569499,558763,567819,569756,0chr1_6cmplcmpl0,2,1,
73chr1_7.1chr1-6210958054726210958054729621095,655411,671766,675182,714118,741178,770817,787588,805150,622072,655629,671982,675415,714673,741271,770978,787786,805472,0chr1_7cmplcmpl1,2,2,0,0,0,1,1,0,
591chr1_8.1chr1+8088328097298088328097291808832,809729,0chr1_8cmplcmpl0,
591chr1_9.1chr1+85738787953385738787953313857387,858952,860071,865553,866418,871151,874419,874651,877789,877938,878551,879077,879287,857398,859156,860328,865716,866469,871276,874509,874792,877868,878438,878757,879188,879533,0chr1_9cmplcmpl0,2,2,1,2,2,1,1,1,2,1,0,0,
591chr1_10.1chr1-88007389462088007389462017880073,880897,881552,881781,883510,883869,886506,887379,887791,889161,889383,891302,891474,892273,892478,894308,894594,880180,881033,881666,881925,883612,883983,886618,887519,887980,889272,889462,891393,891595,892423,892653,894461,894620,0chr1_10cmplcmpl1,0,0,0,0,0,2,0,0,0,2,1,0,0,2,2,0,

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Geneid Genes (geneid) Track Description

Description

This track shows gene predictions from the geneid program developed at the Genome Bionformatics Laboratory (GBL), which is part of the Grup de Recerca en Informàtica Biomèdica (GRIB) at the Institut Municipal d'Investigació Mèdica (IMIM) / Centre de Regulació Genòmica (CRG) in Barcelona."

Methods

Geneid is a program to predict genes in anonymous genomic sequences designed with a hierarchical structure. In the first step, splice sites, start and stop codons are predicted and scored along the sequence using Position Weight Arrays (PWAs). Next, exons are built from the sites. Exons are scored as the sum of the scores of the defining sites, plus the the log-likelihood ratio of a Markov Model for coding DNA. Finally, from the set of predicted exons, the gene structure is assembled, maximizing the sum of the scores of the assembled exons.

Credits

Thanks to GBL for providing these data.