Data Metrics
Molecular dynamics (MD) analysis metrics
SASA Δ value indicates the SASA (Solvent Accessibility Surface Area) change (Δ) between the WT and mutated residue between molecular dynamics simulations. Note that the value provides the absolute SASA difference and, thus, both the side chain positioning and intrinsic properties of the swapped residues affect the reported value. Negative values and positive values indicate an increase or decrease in water exposure, respectively.
Normalized B-factor Δ values are calculated as an average of triplicate 150 ns simulations for the sidechain atoms between the WT and the variant systems. The change (Δ) between the residue bfactor in the WT and the variant is reported. B-factor is the RMSF (Root Mean Square Fluctuation) squared and weight by 8/3π2. The normalized B-factor utilizes the mass-weighted side chain β-factors of unrestrained simulations of isolated amino acids as normalization factors to adjust for the difference in residues’ connectivities and sizes giving a dimensionless relative local side-chain flexibility of 1.0 ( Fuchs et al., 2015). All calculations were carried out by CTRAJ and CPPTRAJ (DOI: 10.1021/ct500633u)
Status supported by evidence
c.dna or HGVSc (Human Genome Variation Society) nucleotide code (e.g., c.1205T>G) is given in the c.dna column. The MANE (Matched Annotation from NCBI and EMBL-EBI) transcripts, which have the GenBank accession number NM_006772.3 are the same as used in ClinVar. These SYNGAP1 transcripts match with the Ensembl Transcript ID ENST00000646630.1 used in gnomAD.
Variant name for missense variant, descriping the protein change, is provided using the single letter amino acid code and residue numbering.
ClinVar, which is a free and public archive of reports of the relationships among human variations and phenotypes, provides the publically available variant status (benign, pathogenic, likely benign, likely pathogenic, VUS) with supporting evidence. Column Clinical Status has links to corresponding entries on ClinVar site. ClinVar data on the table was last retrieved 2024-01-04.
gnomAD (Genome Aggregation Database) encompasses a comprehensive collection of both exome and genome sequencing data. The gnomAD v4 data set information for relevant missense variants, is readily accessible. (DOI 10.1038/s41586-020-2308-7)
Physical metrics
MW Δ indicates the change (Δ) in molecular weight (mol/g) due to the side chain swap between the WT and mutated residue. Negative values and positive values indicate increase or decrease in the molecular weight due to the mutation, respectively.
Hydropathy Δ compares the change (Δ) in the hydropathy index number, which represents the hydrophobic or hydrophilic properties of the residue side chain, due to the missense mutation. The larger or smaller the change, the more hydrophobic or hydrophilic, respectively, the mutated residue is in comparison to the WT residue.
Evolutionary substitution metrics
PAM120 matrix is designed to compare two sequences which are ~120 mutations in 100 amino acids apart. PAM (Point Accepted Mutation) substitution matrices, which are traditionally used in sequence alignment, encode the expected evolutionary change. Smaller (even negative) and larger values indicate an increase or decrease, respectively, in the favorability of the residue swap.
PAM250 matrix is designed to compare two sequences which are ~250 mutations in 100 amino acids apart. PAM (Point Accepted Mutation) substitution matrices, which are traditionally used in sequence alignment, encode the expected evolutionary change. Smaller (even negative) and larger values indicate an increase or decrease, respectively, in the favorability of the residue swap.
Sequence- and structure-based pathogenicity predictions
SIFT (Sorting Intolerant From Tolerant)predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids. The default Single protein and SIFT sequence options, that provide automatic sequence alignment, were used when making the predictions. (DOI: 10.1093/nar/gks539)
PolyPhen-2 (Polymorphism Phenotyping v2) predicts the possible impact of an amino acid substitution on the structure and function of protein based of physical and comparative consideration. (DOI: 10.1002/0471142905.hg0720s76)
- HumDiv: It is compiled using the differences between all damaging alleles causing Mendelian diseases, and the non-damaging ones present between human and mammalian orthologues. Recognizes mutations with drastic effects from normal human variation and can be used for the diagnosis of Mendelian diseases
- HumVar: It is trained using the differences between human disease-causing mutations in UniProtKB and common human nsSNPs with (MAF>1%) with no disease-associated annotation. Identify variants where mildly deleterious alleles are treated as damaging.
ESM1b (Evolutionary Scale Modelling) is a protein language model that leverages deep learning to predict the effects of missense variants on alternative protein isoforms. By analyzing the evolutionary information encoded within amino acid sequences, ESM-1b provides insights into the potential pathogenicity of genetic variants (DOI: 10.1038/s41588-023-01465-0)
AlphaMissense predicts the effects of missense mutations on protein structure and function. The machine learning model uses the AlphaFold model from Google’s DeepMind as a base and modifies it to incorporate information from multiple sequence alignments and genetic databases. Missense variants that score from 0 to 0.34 are likely benign, while those with a score between 0.34 and 0.564 fall under ambiguous and scores higher than 0.564 are likely pathogenic. That is shown as Class. (DOI: 10.1126/science.adg7492) The Optimized column shows scores higher than 0.955 as likely pathogenic, scores lower that 0.784 as likely benign, and others as ambiguous. These thresholds were selected based on the performance in calculating balanced accuracy.
FATHMM (Functional Analysis through Hidden Markov Models) is a species-independent method with optional species-specific weightings for the prediction of the functional effects of protein missense variants. The coding variants were analyzed using the disease-specific option: Nervous System. (DOI: 10.1002/humu.22225)
PremPS is a computational method that evaluates the effect of a missense mutation on protein stability. PremPS utilizes evolutionary- and structure-based features, and it is parameterized on a balanced dataset with an equal number of stabilizing and destabilizing mutations. The 0 ns frame of the WT simulation was used as the reference WT structure to calculate the Gibbs free energy differences (ΔΔG) values between the WT and missense variants. The ΔΔG cut-off was set to ≥1.0 kcal for the destabilizing mutations and ≤-1.0 kcal for the stabilizing mutations. Variants with ΔΔG values between -0.5 and 0.5 were considered neutral or likely benign. (DOI: 10.1371/journal.pcbi.1008543)
FoldX calculates the relative free energy of folding between the WT and missense variants. The Gibbs free energy differences or ΔΔG values are only comparable between the WT and variant system in question used in the calculation. The calculations were based on the 150 ns WT simulations, where each missense variant structure was generated for every representative nanosecond. Additionally, the averages from three replica molecular dynamics (MD) simulation trajectories were computed. The energy barrier for the destabilizing mutations was set to ≥2.0 kcal and ≤-2.0 kcal for the stabilizing mutations. Variants falling within the energy barrier range from -0.5 to 0.5 were considered neutral or likely benign. (DOI: 10.1093/nar/gki387)
Rosetta calculates the relative free energy of folding between the WT and missense variants. The Gibbs free energy differences or ΔΔG values are only comparable between the WT and variant system in question used in the calculation. The Rosetta Cartesian ΔΔG protocol was applied to calculate the ΔΔG between the WT and the missense variants. The 0 ns WT structure from each replica simulation was first relaxed using the full-atom FastRelax protocol for 50 repeats each. The lowest energy relaxed structure was chosen for the further ΔΔG calculations using ref2015 scoring function in the cartesian space. The reported ΔΔG values are the lowest energy difference between the relaxed WT and the Rosetta-generated variant ΔG values from five prediction iterations for each individual run. Rosetta Energy Units (REUS) were converted to Kcal/mol using the scaling factor 2.94. The calculations were run using the customizable PYTHON wrapper RosettaDDGPrediction available under GNU General Public License v3.0, (DOI: 10.1002/pro.4527). The ΔΔG cut-offs for the destabilizing mutations was set to ≥2.0 kcal and ≤-2.0 kcal for the stabilizing mutations. Variants with ΔΔG values within range of -0.5 to 0.5 were considered neutral or likely benign. (DOI: 10.3389/fbioe.2020.558247)
Foldetta calculates the mean prediction value of FoldX and Rosetta for a given variant. The superior performance of Foldx and Rosetta in distinguishing between pathogenic and benign variants was shown to correlate more accurately with the functional impacts from deep mutational scanning experiments via the consensus score "Foldetta". The cut-offs for the Gibbs free energy change (ΔΔG) were established as follows: For destabilizing mutations, the cut-off was set to ≥2.0 kcal/mol while for stabilizing mutations, the cut-off was set to ≤-2.0 kcal/mol. Variants with ΔΔG values within the range of -0.5 to 0.5 kcal/mol were classified as neutral or likely benign. (DOI: 10.1002/pro.4688)
PROVEAN (Protein Variation Effect Analyzer) v1.1 is designed to calculate a delta alignment score derived from the comparison between the reference WT and the variant versions of a protein query sequence. The sequences for comparison are collected from the NCBI NR protein database using BLAST. PROVEAN utilizes CD-HIT for the clustering of BLAST hits using a parameter set at 75% global sequence identity. The sequence set, used to generate the prediction, is composed of the top 30 clusters of closely related sequences. The default score threshold of -2.5 is applied for the binary classification of variants. Variants with PROVEAN scores below this threshold are classified as deleterious, while those with scores above the threshold are considered neutral. (DOI: 10.1371/journal.pone.0046688)
REVEL (Rare Exome Variant Ensemble Learner) is an ensemble-based method for assessing the pathogenicity of missense variants. It is based on a combination of scores from 13 prediction tools including SiPhy, phyloP, and phastCons, MutPred, FATHMM v2.3, VEST 3.0, PolyPhen-2, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, and GERP++. The score ranges from 0 to 1 and, here, any variant acquiring a score of ≥0.5 was assigned pathogenic. The pre-calculated predictions (v1.3 May 3, 2021) in REVEL are not explicitly for the SynGAP1 isoform α2 (UniProt Q96PV0-1; ENST00000646630), but of other transcripts of the SYNGAP1 gene. Predictions for missense mutations at equivalent GRCh38 positions are in our table. (DOI: 10.1016/j.ajhg.2016.08.016)
DOI link to relevant scientific article.
SynGAP Structural Annotation
Structural assessment of missense mutations was done on those segments of the SynGAP protein that are covered by the N-terminal model. These evaluations are derived from a meticulous manual inspection of the in silico variant molecular dynamics simulations. While a missense mutation may align with multiple categories (X) within the structural annotation classification each variant ultimately receives a single status, categorizing it as potentially pathogenic, potentially benign, or uncertain. The variant was given an uncertain status, if the mutation was introduced at the truncated ends of the model. No variant was assigned pathogenic based on its placement at the C2-membrane or GAP-Ras interfaces, unless the mutation had an effect otherwise..
- Secondary: Missense mutations that lead to changes or disruptions in secondary structure elements are potentially pathogenic. The β-hairpins are considered here as secondary structure elements in addition to β sheets, α helices and loops/turns.
- Tertiary Bonds: Missense mutations that result in alterations to tertiary structure bonding, particularly the disruption or formation of hydrogen bonds or salt bridges between different secondary structure elements, are potentially pathogenic.
- Inside-Out: Missense mutations that cause buried residues to become solvent-exposed and/or misplaced, also known as inside-out effects, are potentially pathogenic.
- GAP-Ras Interface: Missense mutations that may alter or disrupt the association between the GAP domain and RasGTPase are noted but without using it as criterion for assigning pathogenicity.
- At Membrane: Missense mutations that may alter or disrupt the association between the C2 domain and the membrane are noted but without using it as criterion for assigning pathogenicity.
- No Effect: Missense mutations, that are not causing or are unlikely to cause detrimental effects on the protein structure based on the variant simulations, are potentially benign.
MD alerts: These alerts are given for those missense variants where molecular dynamics (MD) simulation results may not be entirely reliable. In these instances, the mutation shows clearly muted structural effect likely due to a fully folded starting conformation in the simulation.
Verdict: The variants are categorized based on the structural effects of the missense mutations into: 1) Potentially Benign: if no apparent negative effect is observed on the structure or folding; 2) Potentially Pathogenic: if a negative effect is observed; and 3) Uncertain: due to the premature truncation of the model or due to their projected effect on the complex formation with either the membrane or Ras-GTPase.
Description of Structural Effects: An exhaustive examination of the structural impacts of missense mutations on the 3D structure of SynGAP is presented. This analysis mainly focuses on observations from the SynGAP-solvent molecular dynamics simulation trajectories that aim to detect alterations in the folding stability or potential misfolding events. Notably, the simulations are not expected to result in a completely unfolded state, as they initiate with the protein in its fully folded form. Further insights are derived from the estimated proximity to the inner post-synaptic membrane and RasGTPase.