Search and Translate
   Simple gene/gene model search
   Advanced gene/gene model search
   Search for Gene Models by sequence
   Translate Gene Model IDs
   Search with MaizeMine
   Search pan-genes
Downloads
   Download Files
   Download By Region
   Download Sequence for Gene Model List
   Gene Model Annotations and Orthologs
   Gene Models with Associated Genes
   Insertion data sets
   Gene model cross references
   All associated data for a gene list
   Older gene model downloads
   MaizeMine downloads
Information
   About the current gene model set
   Gene symbol list
   The B73 assembly and annotation
   NCBI annotation releases
   Classical Maize Genes
   Nomenclature
   Gene Model Terms
   Gene Models With Issues


Simple Search: This search form allows you to find gene loci and gene models given basic information (locus name, Gene Model ID, Transcript ID, Translation ID, Gene symbol, Gene name), including partial names.
Use the wildcards '%' or '*' to find matches that contain your search term. '^' at the beginning of search term will find matches that start with that term. '$' at the end of search term will find matches that end with that term.


Submit (see a sample gene model query or locus query)

    (upper limit on results is 2,000 records)
Display records per page.

More Examples: lg1, liguleless1, Zm00001eb067740, GRMZM2G036297, DAA35605, Zm00001d002005_T001
Wildcards: Compare "starts with lg1" to "ends with lg1"



Advanced Search

Check the boxes next to the fields you want to search; if you just want to find records that have any value for that attribute, check the box and leave the criteria alone.


Show only genes:
from :
of :
on :
between

Search for Gene Models by Sequence

Enter up to 5 sequences, Genbank IDs or gene model names (Zmdddddadddddd, ZEAMMB73_xxxx or GRMZMxxxxxx): Sample
Nucleotide
Amino acid




Translate Gene Model IDs - download

   Enter list - 8,000 gene model limit: (Example list)
   
Translate to:


 

Alternatively, download the full gene model associations list between the B73 assemblies and all other assemblies in our database

Download By Region and Annotation


Annotation:
Chromosome:
Model type:
Data type:
Start position:
End position:
(enter positions w/o commas or spaces, or leave both empty for entire chromosome)
   Enter two markers to get gene models within the span. Both must be on the same chromosome. Only applicable for assemblies with aligned markers.
Enter a list of gene models, transcripts, and/orproteins to retrieve their physical coordinates.
Output type for gene model list:       Submit

Download Sequence for Gene Model List

When downloading sequence please specify which type of input you are entering. For genomic please use the gene model name (e.g. Zm00001eb067740). For cDNA, CDS, and mRNA please use the transcript ID (e.g. Zm00001eb067740_T001). For protein please use the translation ID (e.g. Zm00001eb067740_P001). If you enter only the gene model ID, please choose if you want to see all transcripts or only the canonical transcripts.

   Enter list - 8,000 gene model limit: (Example list)
   
Input type:
Output type:


Submit


Gene Model Downloads

The current gene model set for the representative maize genome, m-B73-REFERENCE-NAM-5.0 (B73 v5) is Zm00001eb.1.



Zm00001eb.1 gene model GFF
Zm00001eb.1 gene model cDNA fasta
Zm00001eb.1 gene model CDS fasta
Zm00001eb.1 gene model genomic fasta
Zm00001eb.1 gene model protein fasta



Gene model cross-references and pan-genes

Please note that there is not a 1-to-1 correspondence between all gene models in all annotations. Some gene models are unique to specific genome assemblies, some have been split or merged between annotation or assembly versions, some direct associations may be difficult to calculate, so multiple gene models that are similar in sequence and position may be listed. And some gene models may be similar in sequence but do not appear in the same syntenic locations.
+ Click to learn more.



Gene model associations across all B73 assemblies.
Current pan-gene data


Older gene model downloads

Gene model set Zm00001d.2 corresponds to Gramene release 36.
Zm00001d.2 gene model cDNA fasta
Zm00001d.2 gene model ncRNA fasta
Zm00001d.2 gene model translations fasta
Zm00001d.2 gene model GFF3

Gene model set Zm00001d.1 corresponds to Gramene release 32. (Requires EnsemblPlant login or download as 'Guest')
Zm00001d.1 gene model cDNA fasta
Zm00001d.1 gene model ncRNA fasta
Zm00001d.1 gene model translations fasta
Zm00001d.1 gene model GFF3

Gene model set 5b+ for B73 RefGen v3 corresponds to Gramene release 21.
5b+ gene model cDNA fasta
5b+ gene model ncRNA fasta
5b+ gene model translations fasta
5b+ gene model GFF3
B73 RefGen_v3 MAKER-P gene models

Gene model set Zm00001d.provisional holds low confidence gene models that were not included in the Zm00001d.2 annotation.
Zm00001d.provisional (low confidence) gene model GFFs
Zm00001d.provisional (low confidence) gene model transcripts
Zm00001d.provisional (low confidence) gene model proteins
Cross reference for 5b+ GRMZM and ZEAMMB73 IDs
5b.60: Filtered Gene Set for B73_RefGen_v2
5a.59: Working Gene Set for B73_RefGen_v2
4a.53: Filtered Gene Set for B73_RefGen_v1
4a.53: Working Gene Set for B73_RefGen_v1


Download all data for a list of gene models

Enter a list of B73 gene models, separated by newlines, commas, spaces, or semicolons.
Note: this can take several minutes, even for a short list of gene models.

Enter a list of gene models

Or upload a file (Maximum file size is 50kb)
  

MaizeMine

MaizeMine provides an alternative view into the MaizeGDB data, with an emphasis on genomic, gene expression data, and metabolics.

You can see the list of data sources here.
The MaizeMine tutorial is here.

The "Quick Search" feature on the MaizeMine home page gives quick access to individual gene models, pathways, proteins, et cetera.

To get data associated with a set of gene models, for example expression levels, pathways, GO terms, start by creating a list. Then you can attach data and analyze the list in multiple ways.

B73 Reference Genome Assembly and Gene Model Issues

We need your help! Please report any assembly or gene model structure problems. This includes misassembled regions, evidence for closing gaps, gene models that should be merged or split, evidence supporting low-confidence gene models, et cetera. All issues will be shared with the maize community and with the team charged with improving the B73 assembly and gene models.


All gene model issues

All assembly issues



About the Current Gene Model Set

The current gene model set (i.e. structural assembly annotation) is Zm00001eb.1.

See the 2016 Whole-Genome Assembly and Annotation nomenclature document for an explanation of the assembly and annotation identifiers, which was first adopted for the Zm-B73-REFERENCE-GRAMENE-4.0 / Zm00001d assembly and structural annotation and subsequent assemblies and annotation for B73 and other accessions.


The Zm00001eb.1 gene model set for Zm-B73-REFERENCE-NAM-5.0 is the current recommended set. Other gene model sets are provided for comparison.


Gene model sets and assemblies:
set assembly Gramene/EnsemblPlant version
Zm00001eb.1   Zm-B73-REFERENCE-NAM-5.0
Zm00001d.2   Zm-B73-REFERENCE-GRAMENE-4.0 36/54
Zm00001d.1   Zm-B73-REFERENCE-GRAMENE-4.0 32/50
5b+   B73 RefGen_v3 18/36 - 31/49
5b   B73 RefGen_v2 7/25 - 17/37
4a   B73 RefGen_v1




Reference gene model releases
Gene models for the B73 genome assembly are provided at both MaizeGDB and Gramene. Nomenclature guidelines for gene models, as agreed to by the maize research community, indicate that gene model sets are named with the associated assembly identifier. For the B73 reference genome, this is Zm0001d. Gramene, which manages these gene models uses a different versioning system.

Bold font indicates the current official gene model set.

   
Version Gramene
Version
Date Changes
Zm00001eb.1 1/15/21
Zm00001e.1 1/9/20 NOTE: preliminary; withdrawn when Zm00001eb.1 was released
v38/56 - v43/61 12/7/17 - 3/15/19  Changes limited to gene models outside the reference set (ENSRNA, ncRNA, and inferred organelle gene models)
v37/55 09/21/17  3722 new non-coding gene models, using non-standard prefix, "ENSRNA"; 2318 ncRNA gene models have changed transcripts
Zm00001d.2  v36/54 06/07/17  transcripts changed for 28 miRNAs
v35/53 04/02/17 published in Nature; transcripts changed for 547 gene models
v34/52 12/14/16 transcripts changed for 28 miRNAs
v33/51 174 Mt and Pt gene models added, transcripts changed for 3127 gene models
Zm00001d.1  v32/50 09/28/16  inital release

Gene Model Functional Annotations and Orthologs

Zm-B73-REFERENCE-NAM-5.0
InterproScan results (Also available in the MaizeGDB downloads for all NAM founder assemblies)

Phytozome
Download files include functional Annotations for B73 RefGen_v2 ("Ensembl-18") and B73 RefGen_v4, and orthologs for B73 RefGen_v4. (account required)

B73 RefGen_v2
Gramene.org: Functional Annotations (B73 RefGen_v2 only)
Freeling Lab: Syntenic Orthologs (mapped to RefGen_v2)

Gene Models with Associated Genes

(B73 RefGen_v3 and Zm-B73-REFERENCE-GRAMENE-4.0, aka B73 RefGen_v4)

Classical Genes:   table   tab delimited
MaizeGDB curated genes:   table   tab delimited
All associated genes:   table   tab delimited

Insertion data sets

UniformMu

  About the UniformMu project
W22 to B73 cross-reference:
  Excel spreadsheet
Genomic coordinates for Zm-B73-REFERENCE-NAM-5.0:
   Release 9 Excel spreadsheet
Genomic coordinates for Zm-B73-REFERENCE-GRAMENE-4.0 (aka B73 RefGen_v4):
   Release 9 Excel spreadsheet
   Release 9 Excel spreadsheet with gene structure
List of gene models from the B73 RefGen_v3 Filtered Gene Set that have UniformMu insertions:
   Release 8 Excel spreadsheet
List of gene models from the B73 RefGen_v2 Filtered Gene Set that have UniformMu insertions including 100 bp upstream or downstream:
   Release 7 Excel spreadsheet
   Release 8 Excel spreadsheet
List of gene models from the B73 RefGen_v2 Filtered Gene Set that have UniformMu insertions in exons:
   Release 7 Excel spreadsheet
   Release 8 Excel spreadsheet

Ac/Ds-GFP

  Abut the Dooner & Du Ac/Ds-GFP project
Insertions validated by Warman et al., 2020
  Validation table with B73 v3 and v4 gene model assignments

Zm-B73-REFERENCE-NAM-5.0/Zm00001eb.1 Information

In-depth metadata for Zm-B73-REFERENCE-NAM-5.0 is available here.
See the paper for B73 RefGen_v1 here, and for Zm-B73-REFERENCE-GRAMENE-4.0 here.

Counts for each chromosome.
Chromosome Accession Length Protein Coding Transposable Element
Chromosome 1 LR618874.1 308,452,471 5892 227,345
Chromosome 2 LR618875.1 243,675,191 4751 176,504
Chromosome 3 LR618876.1 238,017,767 4103 173,251
Chromosome 4 LR618877.1 250,330,460 4093 183,689
Chromosome 5 LR618878.1 226,353,449 4485 160,922
Chromosome 6 LR618879.1 181,357,234 3412 129,220
Chromosome 7 LR618880.1 185,808,916 3070 141,993
Chromosome 8 LR618881.1 182,411,202 3536 130,992
Chromosome 9 LR618882.1 163,004,744 2988 117,200
Chromosome 10 LR618883.1 152,435,371 2705 112,766
Unmapped 5892 23,216
Nuclear Total ~2,182,000 39,756 1,577,104
Annotations: Zm00001eb.1 NCBI 103


Zm-B73-REFERENCE-NAM-5.0/Zm00001eb.1 Stats


Gene Feature Value
Average protein-coding transcript size 5376 bp
Longest transcript: 745,091 bp (Zm00001eb334630_T004)
Average transposable element size 1638 bp
Average Exon size 290 bp
Average Number of exons per gene 6 exons
Maximum exons per gene 80 exons (Zm00001eb126710_T002)
Average Coding region size 1816 bp

NCBI annotation releases

The NCBI B73_v5 annotation release 103 for B73 v5 assembly, the NCBI B73_v4 annotation release 101 and NCBI B73_v4 annotation release 102 for the B73 v4 assembly, and the NCBI B73_v3 annotation release 100 were developed at NCBI using the NCBI Eukaryotic Genome Annotation Pipeline. The final set of annotated features comprises, in order of preference, pre-existing RefSeq sequences and a subset of well-supported Gnomon-predicted models. It is built by evaluating together at each locus the known RefSeq transcripts, the features projected from curated RefSeq genomic alignments and the models predicted by Gnomon.

Nomenclature

To ensure consistency across genomes and to better enable pan-genome analyses, MaizeGDB is the single naming authority for the assignment of identifiers for genome assemblies and annotations.

A quick explanation of assembly and annotation identifiers is here.
The full, detailed document describing the maize assembly and annotation nomenclature can be downloaded here
and the complete maize nomenclature guidelines, including for loci, is here.


Gene Model Terms


Associated Genes: Associated Genes are genes that have been linked to a gene model by hand curation.


Canonical: The canonical transcript is defined as either the longest CDS, if the gene has translated transcripts, or the longest cDNA. Note: a canonical transcript is not always the first transcript (T01) or the longest transcript.

Non-canonical. All other transcripts for a gene model that are not the canonical transcript.


Evidence Type: The source of evidence to support the gene model.


Model Types:

Protein Coding A gene model with supporting evidence.
miRNA small, non-coding RNA.
TE Transposable elements.
Low Confidence A gene model with little or no supporting evidence.
WGS. (Versions 5a.59 and earlier) Working Gene Set. This set merges new annotations performed on RefGen_v2 with RefGen_v1 4a gene models mapped onto V2. New annotations were achieved by an evidence-based method (Gramene GeneBuilder) and complemented with de novo Fgenesh models performed on masked DNA.
FGS: (Versions 5b.60 and earlier) Filtered Gene Set. The filtered set was generated by screening the working set to remove pseudogenes, TE-encoded genes, and low-confidence hypothetical models.


Transcript Classes:

WH. With homology to a known non-transposable element in the NR (non-redundant) database at GenBank. Protein-coding gene.
NH. No homology in the NR (non-redundant) database at GenBank. Hypothetical gene or pseudogene.
TE. With homology to a known transposable element (TE) in the NR (non-redundant) database at GenBank. Transposable element.


Discussion of Gene Data



What is a gene?

A gene is a stretch of DNA sequence, a seqment of which is regularly or conditionally transcribed at some time in an organism. The DNA is understood to include not only the exons and introns of the structural gene but the cis 5' and 3' regions in which a sequence change can affect gene expression.


What is a gene model?

A Gene Model is a representation of an mRNA transcript of a gene that contains information about features of the transcript such as exon- intron boundaries, splice sites, UTRs, etc. Due to alternative splicing of mRNA transcripts, there may be more than one gene model for any given gene.