Maize Pan-genome Variant Effect Prediction Tool

Gene Summary

Genome: Zm-B73-REFERENCE-NAM-5.0
Gene model:
Transcript:
Canonical:
Gene Symbol:
Gene Name:

Protein Summary

Protein:
Protein Length:
Pan MSA length:
UniProt:
UniProt Description:
3D Structure:

GWAS-based traits

Show all variant effects MaizeGDB 2024 High Coverage variant effects

PFAM Domains

Secondary Structure

Variant Effects of full protein (heatmap)

1

Variant Effects of zoomed in region (heatmap)

PFAM Domains

Secondary Structure

Heatmap of full protein

1

Heatmap of zoomed in region

Variant Effects Search

Examples: Zm00001eb260000, Zm00001eb268770_T001, Zm00001eb169850_P001, lg1, wx1

Show both B73 and pan-genome views Show B73 view only Show pan-genome view only

Summary: This tool visualizes the potential effects of missense variants on proteins, showcasing the impact of amino acid substitutions for each residue. Using the ESM protein language model through the esm-variants tool, it calculates variant effect scores based on the log-likelihood ratio between the variant and its wild-type. Scores above -7 indicate benign outcomes, while scores below -7 suggest possible phenotypic effects. A score of 0 denotes when the variant and wild-type are the same. Our heatmap representation range from blue (benign effects) to green / yellow (mild effects) to red (phenotypic effect). Additionally, Pfam domains and predicted secondary structures are included to highlight and provide context to the protein's functional domains.

This tool has two views:

Variant effects in B73: This view shows the predicted impact of all possible amino acid substitutions for proteins in the B73 genome. The views are available for all isoforms.

Variant effects across the pang-enome: The view shows the predicted impacts across the natural variation in maize. Using B73 as a reference, all the proteins in the pan-gene set are aligned and the variations are color-coded based on potential phenotypic impact. Note: insertions and deletions are shown, but no effect score is predicted and only canonical isoforms are used for each gene model.

Maize Variant Effects Tool

The maize variant effects tool offers a comprehensive exploration of the potential impacts of missense variants on maize proteins. With this tool, users can delve into four specialized views:

Gene Summary: This section delivers a detailed look at annotations and provides valuable information regarding gene models and proteins. The gene summary includes gene and gene model annotations, protein annotations, links to 3D structure tools, and trait data from three collections of genome-wide association studies.
Variant Effects in B73: The first variant effects view provides a visualization of the variant effects of all possible amino acid substitutions in the B73 maize line. This section has two heatmaps, one providing a broad overview and another providing detailed insight. The width of the heatmap represents each position of the reference protein and the height of the position represents the 20 possible amino acid substitutions. The cells within the heatmap shift from blue (benign outcomes) to red (strong phenotypic impact). Hovering over a cell provides additional information about the possible substitution including position, substitution, and score. A slider bar in the center of the page controls which portion of the protein is shown on the zoomed page. Two tracks above the heat show the locations of Pfam domains and predicted secondary structures which further provide context regarding the functional and structural roles in the consequence of amino acid substitutions.
Variant Effects across the Pan-genome: The second view also displays heatmaps with a broad and detailed view, but in this instance, they show the effects of naturally occurring variations on maize proteins from across the pan-genome. The width of the heatmap represents each position of the protein of the reference protein while the height of the heatmap represents each protein in the pan-genome aligned to the reference protein. Insertions and deletions in the alignments of the proteins are represented by a ‘-’. The heatmap representation shifts from blue (benign outcomes) to red (strong phenotypic impact) for variants within the pan-genome as compared to the reference protein. Hovering over a cell provides additional information about the possible substitution including B73 position, target position, target genome, target gene model, substitution, and score. This view is only available for the canonical transcript of each gene model.
Search: The search section has a search bar and brief description of the tool. The search will accept gene models, transcripts, and protein identifiers.

Data Sources:

PanEffect integrates seven different datasets to explore the potential phenotypic consequences of missense mutations. A variety of tools and datasets were used to generate input files for PanEffect. The code can be found at GitHub and the data can be found at the MaizeGDB downloads.

Variant effect scores in B73: The variant effect scores are calculated by using the ESM protein language model (Z. Lin et al., 2023) through the esm-variants tool (Brandes et al., 2023). The esm-variant tool calculates variant effect scores based on the log-likelihood ratio difference between the variant and its wild type. Scores above -7 indicate benign outcomes, while scores below -7 suggest possible phenotypic effects (Brandes et al., 2023). A score of 0 denotes when the variant and wild-type are the same. PanEffect provides a Python script to convert the the esm-variant tool output into a CSV file containing the x and y position of the heatmap, the variant score, and the amino acid code for the wild-type (B73) and the substitution.
Pan-genome multiple sequence alignments: The multiple sequence alignments (MSA) were computed using the software package Pandagma which uses an all-vs-all approach to align every protein in a given pan-gene group. Pandagma generated the multiple sequence alignemnts for the maize reference genomes in the pan-genome including: three versions of B73 (Hufford et al., 2021; Jiao et al., 2017), a set of 25 diverse maize lines called the Nested Association Mapping (NAM) panel, 12 lines used in Chinese breeding programs, 4 European lines, 12 lines from the Andropogoneae tribee of grasses, and a additional set of high-qualiry maize lines available at MaizeGDB. The maize genome B73 RefGen_v5 (Zm-B73-REFERENCE-NAM-5.0) is used as a reference and contains 39,755 gene models and 75,539 transcripts.
Variant effect scores in the pan-genome: The variant effect scores for B73 were combined with the pan-genome MSAs to create heatmap representations for variant effects across the maize pan-genome. Using B73 as a reference, the natural variation in each protein of the pan-genome were given a score using B73 as the wiltype and the target protein as the variant substitution. In GitHub, in Github PanEffect has a Python script to combine these two data types to create a TSV listing the x and y positions of the heatmap, the variant score, the positions of the amino acid in B73 and target protein, and the amino acid codes at those positions.
Secondary protein structures: MaizeGDB has predicted the three-dimensional protein structures for all protein isoforms in the B73 genome (Woodhouse et al., 2023). These structures are used as input to the DSSP tool (Kabsch and Sander, 1983) to assign secondary structures (alpha-helices and beta sheets) to the proteins. In Github there is a Python script that converts the DSSP output to a TSV file listing the position, amino acid, and secondary structure code for each B73 isoform.
Pfam domains: The locations of Pfam functional domains for the B73 proteome were calculated by the NAM sequencing consortium using Interproscan and are available at MaizeGDB. The Github repository has a Python script that generates a separate TSV file containing the Interproscan (including PfAM position, ID, name, and Gene Ontology terms) for each B73 transcript.
Functional annotations: MaizeGDB hosts a set of functional annotations for maize genome assemblies including UniProt annotations, canonical gene transcript IDs, and manually annotated gene names and symbols for each gene model in B73. In GitHub, PanEffect has a Python script to parse this information into a TSV file for each B73 gene model.
GWAS trait annotations: MaizeGDB hosts three GWAS atlas datasets. Each of these datasets compiled and/or generated sets of single nucleotide polymorphisms (SNPs) linked to specific traits using Genome-wide association studies (GWAS). The first set was compiled in 2014 (Wallace et al., 2014) and has GWAS mappings for over 40 traits to 40,000 SNP locations in the B73 genome. The second set is from the 2022 update of the GWAS Atlas database (Tian et al., 2020), and combined GWAS data from 133 papers covering 531 studies for 279 traits across 42,000 SNP loci. The third dataset is from a 2022 study (Li et al., 2022 that performed GWAS for 21 important agronomic traits across 1,604 inbred lines and identified 2,360 significant associations at 1,847 SNP loci. PanEffect has a Python script that finds any SNP position within 1,000 base pairs of the start and end position of a B73 gene model. The data for the three datasets are merged and a final TSV file was created for each gene model, listing the trait name and which study it came from.

Downloads: