MaizeGDB is a community-oriented, long-term, federally funded
informatics service to researchers focused on the crop plant and
model organism Zea mays.
MaizeGDB is a founding member of
AgBioData,
a consortuim of agriculture-related online resources which is
committed to making agriculture-related research data FAIR.
GCV: A web-app that visualizes genomic context data in a single, federated interface by using functional annotations as a unit of search and comparison.
Yikun Zhao, Yuancong Wang, De Ma, Guang Feng, Yongxue Huo, Zhihao Liu, Ling Zhou, Yunlong Zhang, Liwen Xu, Liang Wang, Han Zhao, Jiuran Zhao, Fengge Wang
Funding
This research was supported by grants from the special project for the construction of scientific and technological innovation capacity of Beijing Academy of Agriculture and Forestry Sciences (NO. KJCX20200305)
Publication status
Published
Project reference
A chromosome-level genome assembly and annotation of the maize elite breeding line Dan340.
Yikun Zhao, Yuancong Wang, De Ma, Guang Feng, Yongxue Huo, Zhihao Liu, Ling Zhou, Yunlong Zhang, Liwen Xu, Liang Wang, Han Zhao, Jiuran Zhao, Fengge Wang
DOI
Stock and Biosample Information
Stock information
Stock name
Dan340
Stock provided by
Beijing Academy of Agricultural and Forest Sciences (BAAFS)
Maize Research Center, Beijing Academy of Agricultural and Forest Sciences (BAAFS)/Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing 100097, China
The length of scaffold which takes the sum length (summing from longest to shortest scaffold) past 50% of the total assembly size.
Total sequence length represented by contigs.
The longest contig.
The shortest contig.
The length of contig which takes the sum length (summing from longest to shortest contig) past 50% of the total assembly size.
A contig is a contiguous consensus sequence that is
derived from a collection of overlapping reads.
A scaffold is set of a ordered and orientated contigs
that are linked to one another by mate pairs of sequencing reads.
Annotation
Annotation Identifier
Zm00104aa.1
Annotation Date
2022-05-07
Is current
yes
Annotation Description
Repeat sequences of the Dan340 genome were annotated using both ab initio and homolog-based search methods. For the ab initio prediction, RepeatModeler (Version 1.0.8), RepeatScout (Version 1.0.5), and LTR_Finder were used to discover transposable elements (TEs) and to build a TEs library. An integrated TEs library and a known repeat library (Repbase Version 15.02, homolog-based) were subjected to RepeatMasker (Version 3.3.0) to predict the TEs. For the homolog-based predictions, RepeatProteinMask was performed to detect the TEs in our genome by comparing it against a TE protein database. Tandem repeats were ascertained in the genome using Tandem Repeats Finder (Version 4.07b). As a result, 1723.99 Mb of repeat sequences were identified, accounting for 73.40% of the genome size. Among these repeat sequences, 1555.57 Mb were predicted to be long-terminal repeat (LTR) retrotransposons, and 44.53 Mb were predicted to be DNA transposons, accounting for 66.23% and 1.60% of the genome, respectively. Furthermore, among the LTR retrotransposons, the Gypsy and Copia superfamilies comprised 23.81% and 12.75% of the genome, respectively. Thus, retrotransposons accounted for a large proportion of the Dan340 genome, which was consistent with the genomic characteristics of other maize inbred lines.All repetitive regions except the tandem repeats were soft-masked for protein-coding gene annotations. Five ab initio gene prediction programs, Augustus (Version 3.0.2), GENSCAN (Version 1.0), GeneID, GlimmerHMM (Version 3.0.2), and SNAP (Version 2013-02-16), were used to predict genes. In addition, the protein sequences of five homologous species (Sorghum bicolor, Setaria italica, Hordeum vulgare, Triticum aestivum, and Oryza sativa) were downloaded from Ensembl and NCBI. Homologous sequences were aligned against the genome using TBLASTN (E-value 1 × 10−5). GeneWise was employed to predict gene models based on the sequence alignment results.