History of Maize Genome Assemblies and Annotations

Representative Maize Genome Assembly
NAM founders
Nomenclature
Suggested Elements for a Successful Collaboration with the Maize Community


If you need to document involvement of MaizeGDB in your planned assembly or annotation efforts, contact Carson Andorf ([email protected]) for a letter of collaboration.




B73 GENOME ASSEMBLY

It is imperative that the community work from the same genome coordinate system across projects in order to allow the data generated by various groups to be fully leveraged and displayed in a comparable manner. Like many Model Organism Databases, MaizeGDB is charged to facilitate this process and is committed to releasing official genome assemblies as they are made available.


November 2009
The first complete assembly of B73 as the representative genome for maize, B73 RefGen_v1, was published in Science: The B73 Maize Genome: Complexity, Diversity, and Dynamics. Schnable et al. 2009.


November 2010
B73 RefGen_v2 was released as the default view of the assembly at MaizeGDB. This version was calculated by the Maize Genome Sequencing Consortium and became available via GenBank on December 7th, 2012. The project record is 10769.


April 2013
The next version, B73 RefGen_v3, became the default assembly view of the MaizeGDB Genome Browser in April 2013. RefGen_v3 was not a global re-assembly. B73 RefGen_v3 used Roche/454 reads produced from a whole genome shotgun (WGS) sequencing library to capture missing gene space within and between the original BACs. The 454 reads were assembled into contigs with AbySS and aligned to the B73 RefGen_v2 assembly to identify new contiguous pieces of DNA sequence that were already represented in the v2 assembly. In addition, ~65,000 Full Length cDNAs (FLcDNAs- from the Maize Full Length cDNA project; more information here and here) were aligned to both the B73 RefGen_v2 contigs and the new contigs. B73 RefGen_v3 was the final product of the Maize Genome Sequencing Consortium.


August 2016
An entirely new assembly of the maize genome, Zm-B73-REFERENCE-GRAMENE-4.0 (aka B73 RefGen_v4) was constructed from PacBio Single Molecule Real-Time (SMRT) sequencing at approximately 60 fold coverage and scaffolded with the aid of a high-resolution whole-genome restriction (optical) mapping. This new assembly was constructed without the assistance of the BAC physical map that had been used to guide the previous V1-V3 assemblies. The pseudomolecules of maize B73 RefGen_v4 were assembled nearly end-to-end, representing a 52-fold improvement in average contig size relative to the previous reference (B73 RefGen_v3). B73 RefGen_v4 was funded by the NSF IOS #1112127 award to Gramene.


January, 2020
Version 5.0 of the B73 (PI 677128) genome assembly was released as Zm-B73-REFERENCE-NAM-5.0 along with the 25 maize NAM founders by the NAM Consortium. The assembly was produced using PacBio long reads and mate-pair strategy. Scaffolds were validated by BioNano optical mapping, and ordered and oriented using linkage and pan-genome marker data.



THE NAM FOUNDERS ASSEMBLIES

January, 2020
Genome assemblies of the 25 NAM founders provide a valuable addition to genomic resources for maize, spanning a broad cross-section of maize diversity. This set of genomes complements the B73 representative maize genome by adding pan-genome sequence not present in B73. These assemblies will provide insights into genomic diversity across maize in addition to providing the maize research community with a pan-genome. The assemblies were produced by the NAM Sequencing Consortium using PacBio long reads and mate-pair strategy. Scaffolds were validated by BioNano optical mapping, and ordered and oriented using linkage and pan-genome marker data.
De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Hufford et al., 2021



ADDITIONAL GENOME ASSEMBLIES

For many years, the B73 genome was the only reference quality genome assembly available for maize due the high costs of sequencing and assembling a large (~2.1 GB) genome. Although B73 remains the representative genome assembly for maize, many more high-quality assemblies are now available, and their numbers are expected to increase as the cost of sequencing technologies continues to drop, and assembly methods improve. Increasingly, these genome assemblies are completed in sets (for example, the NAM founders/ and can be used both within their sets and in conjunction with all other available genome assemblies. Unless requested to withdraw a genomic an assembly version, MaizeGDB will maintain all past assembly data even if improved versions are produced. Detailed information on those genome assemblies can be found here.



PAN-GENE ANALYSES

The 25 maize NAM founder genome assemblies and annotations, along with version 5.0 of the B73 genome, the ability to construct comprehensive pan-genomes became a reality. The first set of maize pan-genes, calculated on 58, annotations, was carried out by MaizeGDB staff using the Pandagma pipeline and presented in a new pan-gene data center in November, 2023.



GENOME ASSEMBLY AND GENE MODEL NOMENCLATURE

A well-developed nomenclature system is necessary to prevent confusion and to relay as much information as possible without being overly cumbersome. A nomenclature system needs to account for species-specific information so that the exact inbred line used and project-specific metadata can be accessed easily. The change from GRMZM IDs to the new nomenclature was necessitated for several reasons. The main reason is to connote which maize line the models are derived from. This is particularly important in maize, which is well documented to contain substantial presence/absence variation (PAV) and copy number variation (CNV) across inbred lines. To make this transition easier, older maize nomenclature is retained as a synonym and can be used to look up gene models at MaizeGDB. Specific details on the current maize nomenclature standards in use can be found here and here.



MAIZEGDB ACCEPTS FUNCTIONAL ANNOTATION

Functional annotation can mean different things to different people. It generally involves attaching information regarding gene product identity, biological or biochemical function, expression, regulation, and interactions to a genomic DNA sequence. Are you generating RNAseq data and wish for that to be aligned to assemblies to show that the genes in a particular region are expressed? Do you have a mutation for a gene that is mapped to a genome assembly and the mutant phenotype is known? Have you experimentally determined the temporal and spatial regulation of a small group of transcription factors? MaizeGDB is interested in both small and large functional annotation data sets determined by either in silico analysis or experimental validation. Contact us at MaizeGDB to find out how your functional annotations can be included in the MaizeGDB resource.

In addition to the types of functional annotations already described, we at MaizeGDB accept functional annotations that are based upon assignment of terms from the Gene Ontologies (GO; http://www.geneontology.org) to gene structures. When GO terms are assigned to a particular gene, standard Evidence Codes are required to document how the inference of function was made. For example, an annotation that was made on the basis of a published, peer reviewed experiment would have the evidence code EXP, whereas an annotation made on the basis of an enzyme assay would have the evidence code IDA. Evidence Codes used by the Gene Ontology Consortium are available here.



SUGGESTED GUIDELINES FOR RESEARCH GROUPS PLANNING TO SEQUENCE, ASSEMBLE, AND ANNOTATE A MAIZE GENOME FOR SUBMISSION TO MAIZEGDB

A plan for providing documentation that is complete, accurate, and timely. A centrally accessible plan should be made available at the time that your project begins and include a timeline for data delivery. Functional and structural annotation should be provided with standard evidence codes, clearly discriminating annotation with experimental evidence from purely in silico analyses.

A plan for developing a close working relationship with MaizeGDB as the ultimate disseminators of the information. Assemblies and annotations should be delivered to MaizeGDB regularly and in a timely fashion. MaizeGDB can display the deliverable dates so as to keep the community informed. Ideally, these dates should be known in advance, and should be adhered to if at all possible. MaizeGDB will create a genome assembly webpage to display your project metadata for your genome assembly. It is understood that delays can occur. The intent here is to make the process more transparent to the research community.

A mechanism for interacting with the maize community directly and with a single voice. Maize researchers comprise a vibrant community with researchers at all levels in both the public and private sectors. A bidirectional means of communicating with the maize community should be deployed at the start of the project so that the maize community can both absorb and respond to new project information quickly. The goal is to provide all community members with the same information at the same time so that they can plan their research activities accordingly. This can be accomplished in many ways (FAQs, blogs, social media, conferences, etc.) and all options should be considered so as to reach the largest number of stakeholders.

A robust way to capture genome assembly and annotation information from the community. For any genome assembly, researchers often have high-quality structural and functional annotations for their genes of interest, both stored on lab computers and documented in publications. Researchers are usually willing to share this information freely, but currently, there is no robust means to capture it. Groups developing genome assemblies are encouraged to work with MaizeGDB to develop a plan for collection of high value annotations that are specific to their assemblies. All annotation submitted by community members for specific genome assemblies will be vetted by MaizeGDB curators and then be incorporated into the assembly, with an indication of who provided the data. It is expected that while there will be comparatively little data entering the assembly process in this way, these data would be of very high quality.

See information about submitting genome assemblies to MaizeGDB here.