MaizeGDB is a community-oriented, long-term, federally funded
informatics service to researchers focused on the crop plant and
model organism Zea mays.
MaizeGDB is a founding member of
AgBioData,
a consortuim of agriculture-related online resources which is
committed to making agriculture-related research data FAIR.
GCV: A web-app that visualizes genomic context data in a single, federated interface by using functional annotations as a unit of search and comparison.
Tom Brutnell, Erik Vollbrect, Hugo Dooner ,Karen Koch, Don McCarty, Chunguang Du, Omer Barad, Ed Buckler, Doreen Ware, Georg Jander, Gil Ben-Zvi, Ilya Soifer, Kobi Baruch, Doron Shem-Tov, NRgene
Publication status
published
Project reference
The maize W22 genome provides a foundation for functional genomics and transposon biology..
Springer NM, Anderson SN, Andorf CM, Ahern KR, Bai F, Barad O, Barbazuk WB, Bass HW, Baruch K, Ben-Zvi G, Buckler ES, Bukowski R, Campbell MS, Cannon EKS, Chomet P, Dawe RK, Davenport R, Dooner HK, Du LH, Du C, Easterling KA, Gault C, Guan JC, Hunter CT, Jander G, Jiao Y, Koch KE, Kol G, Köllner TG, Kudo T, Li Q, Lu F, Mayfield-Jones D, Mei W, McCarty DR, Noshay JM, Portwood JL 2nd, Ronen G, Settles AM, Shem-Tov D, Shi J, Soifer I, Stein JC, Stitzer MC, Suzuki M, Vera DL, Vollbrecht E, Vrebalov JT, Ware D, Wei S, Wimalanathan K, Woodhouse MR, Xiong, Brutnell TP.
PMID DOI
Sequence service provider: Roy J. Carver Biotechnology Center (Urbana, IL) at the University of Illinois Sequencing method: Illumina short read and 10x Genomics Sequencing hardware: Illumina short read and 10x Genomics Genome coverage: 210x
Assembly description
Assembly methods: DenovoMAGIC Construction of pseudomolecules: Scaffolds were ordered and oriented
Roy J. Carver Biotechnology Center (Urbana, IL) at the University of Illinois
Assembly statistics
Scaff num
306
Longest scaff
83,688,765 bp
N50 scaff length
35,520,102 bp
N50 scaff count
18
N90 scaff length
10,997,073 bp
N90 scaff count
58
Total number of scaffolds in assembly.
Longest scaffold in assembly.
The length of scaffold which takes the sum length (summing from longest to shortest scaffold) past 50% of the total assembly size.
How many scaffolds are counted in reaching the N50 threshold.
The length of scaffold which takes the sum length (summing from longest to shortest scaffold) past 90% of the total assembly size.
How many scaffolds are counted in reaching the N90 threshold.
A contig is a contiguous consensus sequence that is
derived from a collection of overlapping reads.
A scaffold is set of a ordered and orientated contigs
that are linked to one another by mate pairs of sequencing reads.
Annotation
Annotation Identifier
Zm00004b.1
Annotation Provider
Yinping Jiao, Ware lab
Annotation Date
May, 2017
Is current
yes
Annotation Software
MAKER-P
Annotation Description
Annotation of protein coding genes was performed using MAKER-P pipeline software(Campbell et al. 2014), with parameters and evidence similar to those recently used to annotate B73(Law et al. 2015; Jiao et al. 2016). Repeat masking by RepeatMasker was performed using exemplar transposon sequences (Schnable et al. 2009) available online at the maize transposable element database. We excluded helitron and MULE elements to avoid false-positive masking from captured exon sequences in such elements. Gene expression evidence included PacBio Iso-seq long reads sequenced from cDNA libraries of six tissues in B73 (n=111,151)(Wang et al. 2016). In addition, we included the following transcriptome assemblies, each processed to exclude short transcripts (<300-bp) and redundancies based on application of CD-HIT(Fu et al. 2012): 1) a pooled set of 94 transcriptome assemblies constructed from publicly-available RNA-seq reads (n=508,233) (Law et al. 2015), 2) a transcriptome assembly of B73 seedlings (n=112,963) (Martin et al. 2014), 3) a transcriptome assembly of W22 tissues (n=589,743). Cross-species evidence was supplied in the form of the following annotated protein files downloaded from Gramene release 46(Gramene FTP) (Tello-Ruiz et al. 2016): 1) Arabidopsis_thaliana.TAIR10.27.pep.all.fa, 2) Brachypodium_distachyon.v1.0.27.pep.all.fa, 3) Oryza_sativa.IRGSP-1.0.27.pep.all.fa, 4) Setaria_italica.JGIv2.0.27.pep.all.fa, and 5) Sorghum_bicolor.Sorbi1.27.pep.all.fa. Alignment and downstream processing of sequence evidence to the repeat-masked W22 reference was performed within the MAKER-P pipeline using default parameters. For gene model prediction, the pipeline incorporated AUGUSTUS(Stanke et al. 2006) applied with the maize5 model and FGENESH(Salamov and Solovyev 2000) applied with the monocot model. Stable gene identifiers were assigned using the format Zm00004bXXXXXX (where the X's represent a random 6-digit number), as specified under A Standard For Maize Genetics Nomenclature available at MaizeGDB.