A Comprehensive In Silico Analysis of Deleterious SNPs of Paraplegin Protein Associated with Hereditary Spastic Paraplegia through Mitochondrial Dysfunction

Hereditary spastic paraplegia (HSP) is a heterogenous neurological disorder primarily associated with progressive spasticity. Paraplegin is a mitochondrial protein and mutations in this protein can lead to HSP. In this study, in silico analysis was carried out to identify the pathogenic variants of SPG7 (paraplegin protein). To find novel pathogenic mutations, missense and splicing variants were collected from gnomAD database and passed through a detailed and stringent analysis with the help of a variety of bioinformatic tools. The list of mutations was examined and compared in ClinVar. Altogether, 14 missense mutations and 18 splicing mutations were obtained and these mutations were predicted to have the potential of disrupting the normal structural and functional properties of paraplegin protein. Keyword: in silico analysis, hereditary spastic paraplegia (HSP), paraplegin, SNPs, SPG7


Introduction
Hereditary spastic paraplegia (HSP) is a genetic neurodegenerative syndrome which is also known as Strumpell-Lorrain disease. It is described on the basis of a major hallmark, that is, axonal degeneration, which leads to abnormalities in the corticospinal region. The corticospinal dysfunction ultimately causes weakness and spasticity in the lower limb area [1,2].
The heterogeneity associated with the clinical manifestations of HSP reflects that some proteins of HSP contribute in processes or pathways at cellular and molecular level, so the abnormalities or mutations may spur defects in multiple processes. Since HSP genes are also associated with mitochondrial functions, hence mitochondrial abnormalities have been implicated in the development of neurological disorders [3].
Paraplegin (SPG7) shows the autosomal recessive form of HSP. It is a metalloprotease protein with 17 exons and 795 amino acids which have an AAA domain [4]. The protein has a prevalence rate of 2-6/100,000 [5]. It is primarily present with the homologous AFG3L2 protein, which is a protease in nature and is found in mitochondria. The main functions performed by paraplegin include controlling the quality of protein, enzyme processing, maturation of mitochondrial protein and it also plays a role in ribosomal assembly. Mutation in paraplegin can lead to defects in the oxidative phosphorylation process of mitochondria and it has been observed in the muscle biopsies of HSP patients [4,6].
Over the past many years, massive efforts and high-throughput methods have been formulated to unravel the BioScientific Review Volume 2 Issue 2, 2020 complicated heterogeneity, mutations, and functional variants associated with different diseases. The need for welldefined and organized strategies yielding tremendous genetic information is imperative in this research field. The identification of novel pathogenic variants is a serious challenge with a wide variety of variants being available. It obliges the scientists to identify potentially damaging variants keeping in view all available evidences. For this purpose, a variety of computer-aided bioinformatic tools are available that have the potential to predict the likely pathogenic ability of the variants [7].
To find out and predict whether an SNP will lead to disease or not, computational tools designed for bioinformatic analysis can be used. Many projects have been carried out to analyze and predict the effect of various genetic variations using various computational tools to understand the genetic basis of different diseases. According to an estimate, there are more than four million proven single nucleotide polymorphism (SNPs) in the human genome. The vast volume of the available genetic variation aids in analyzing the basis of different diseases through bioinformatic tools. These tools are a great source of functional analysis of human genome [8].
The current study was carried out to find novel mutations of the gene SPG7 related to HSP through mitochondrial dysfunction with the help of in silico tools.

Data Retrieval
To study and identify mutations that lead to HSP, the variants of SPG7 gene associated with HSP through mitochondrial dysfunction were retrieved via in silico analysis using gnomAD (Genome Aggregation Database). Protein name was used to obtain the list of variants. Different filters such as the loss of function, nonsynonymous, missense and heterozygous were applied on the gnomAD site [9] to get the desired variant list. The variant list was then downloaded in VCF format. The protein sequence of paraplegin protein was retrieved from NCBI.

Variant Selection
The variants obtained from gnomAD were filtered further on the basis of their allelic frequency. To select a cut-off value, the allelic frequencies of 100 pathogenic mutations related to the autosomal recessive form of HSP were collected from Variation Viewer of NCBI. An allelic frequency of <0.002 was used to filter out further variants in gnomAD. The variants below 0.002 were subjected to CADD analysis.

CADD Analysis
CADD [10] was used for scoring the variants based on their deleterious mutations. Various filters were applied to get the desired variants. A cut-off value was applied for C-score (PHRED score >15) and variants were further selected and analyzed with a variety of bioinformatic tools. The analysis was aimed at finding pathogenic mutations of paraplegin protein in the missense and splicing sites.

Missense Variant Analysis
After C-scoring, to analyze the missense variants, a filter was applied in the CADD file to retrieve only the missense variants of paraplegin protein that were later analyzed with PhD-SNP g [11], PredictSNP2 [12], UMD-predictor [13], SNP&GO [14] and PROVEAN [15] for Department of Life Sciences Volume 2 Issue 2, 2020 further validation of results. The variant list was uploaded in either CSV or VCF file format on the online server and each tool showed the output based on its algorithmic criteria. The pathogenic variants obtained were further compared in ClinVar [16] to find out whether they had been reported in any previous article or research work.

3D Modelling of Paraplegin
To check the effect of certain mutations on protein structure, function and stability, the modelling of protein was performed through I-TASSER. Structural models of protein were obtained through I-TASSER. Based on C-score and RMSD, the model with a high confidence level was selected. Cscore basically shows the confidence level used for measuring the quality of all predicted structural models of protein [17]. For visualization purpose, Chimera was used.

Stability Analysis
To analyze the effect of mutations at amino acid level on the stability and functional properties of paraplegin protein, I-Stable was used [18]. Using this tool, protein sequence and amino acid substitution were added one by one for each variant as input format to subsequently study the positive or negative stabilizing effect of each variant on the protein. Pathogenic mutations can destabilize the normal structural and/or functional activity of the protein.

Splicing Variant Analysis
The analysis of splicing SNPs of paraplegin protein was performed using several computer-aided tools (Human Splice Finder, Spice and Spliceman). Mutations in the splicing regions have the potential to break or add a new splice site, thus disrupting the normal splicing process. To find novel mutations, the mutations obtained in this study were checked with already reported mutations in ClinVar.

Clashes and Contacts
Clashes / contacts were also analyzed based on VDW radii to check if the introduced mutation showed an unfavorable interaction with nearby residues. Contacts tells that if there is any direct interaction, either polar or non-polar, between the residues which may not be necessarily favorable [19].

Hydrophobicity Analysis
Hydrophobicity analysis was performed according to kdHydrophobicity scale using Chimera [20].

3D structure of Paraplegin
Structural models of paraplegin were obtained through I-TASSER. Among the five models produced through I-TASSER, the first was selected as the best model of spartin protein ( Figure 1). For SPG7, the estimated C-score was -2.06, estimated TM-score was 0.47±0.15 and estimated RMSD was 13.5±4.0Å

Missense Variant Analysis of SPG7
Variants of SPG7 gene were retrieved through gnomAD. A total of 2021 variants were obtained. An allelic frequency filter (< 0.002) was applied and a total of 1977 variants were further uploaded in CADD. After CADD analysis, missense and PHRED score filter (≥15) were applied and 463 variants were obtained.

Deleterious missense SNPs identified through different bioinformatic tools.
The obtained variants were further analyzed using different bioinformatic tools including SNPs&GO, PhD-SNPg, PREDICTSNP2, PROVEAN and UMD-Predictor. A specific set of filters including deleterious, pathogenic, and probably pathogenic filters was applied to obtain highly pathogenic variants among a variety of variants. Thirty-two variants were obtained. The purpose of using these filters was to obtain those variants that were predicted to have an effect on the activity of the gene using all five in silico tools (Table 1).

Stability predictions.
The variants were further analyzed through I-Stable to check the effect of these mutations on the stability of protein. These mutations have the potential to cause destabilization in the normal structural and functional properties of the protein.
The I-Stable tool shows whether the introduced mutation will decrease the stability of the protein or not. The filter of "decrease" was applied and only the variants with a high potential to decrease the stability of protein were selected. Nine variants out of thirty-two listed in the variant list were finally obtained (Table 2, Figure 2).

Clashes and Contacts
Clashes and contacts were analyzed to study the interaction of mutant residue with the neighboring residues using Chimera. Four missense mutations exhibited interaction with neighboring residues. These interactions have the potential to destabilize the structure of protein, thus leading to the disruption of the normal protein structure (Figure 3).

Hydrophobicity Analysis
Chimera was used to carry out hydrophobicity analysis. In this process, amino acid residues were given characteristic features and properties according to kdHydrophobicity. Values were assigned to each amino acid residue based on the hydrophobicity scale generated by Kyte Figure 4)

Splicing SNPs Identified through HSF, SPICE and SPLICEMAN
The variants were retrieved through gnomAD and a total of 2403 variants were obtained. After applying allelic frequency and PASS filter (< 0.002), 2351 variants were left which were analyzed through CADD score. CADD score filter (≥ 15) was applied and a total of 148 variants were obtained which were further analyzed by applying various filters (Canonical-Splice, Splice donor and Splice acceptor). Ultimately, 23 variants were obtained at the end. The variants were further analyzed through different bioinformatic tools namely Human Splice Finder, Spicev2.15 and Spliceman to check the effect of the mutation on the splicing regions of paraplegin (Table 4).  The threshold value is set as 65 in HSF. For a mutation, if the wild type score is above the set value and the variation score is below -10%, then it will break the splice site. If the wild type score is below the set value and variation score is above +10%, the mutation will produce a new splice site. Asterisk (*) sign with amino acid substitutions indicates that the given mutations have already been reported in ClinVar, while all other mutations are novel.

Discussion
HSP is clinically and genetically heterogeneous group of disorders associated with lower limb weakness followed by progressive spasticity and with the passage of time, the condition may worsen. Currently, there is no particularly effective therapy for HSP, although there are drugs (anti-spastic effect) such as diazepam and baclofen that play an important role in lessening the complications associated with the disease (including pain and fractures), as well as improving the lives of patients [21,22]. Today, there is a dire need to understand the pathophysiology associated with this disease as well as a detailed analysis of the genetic mutation leading to HSP.
The importance of identifying novel variants in the field of genetic testing related to a particular disease is increasing day by day. It will ultimately bring prominent changes by enhancing the sensitivity of the genetic testing process. Although the most commonly used approach for this purpose is gene sequencing; however, due to the vast variety of newly found variants it is really difficult to analyze the pathogenicity of each variant. Also, all variants do not alter the functional properties of the protein [23,24].
In this study, a list of variants of SPG7 gene associated with HSP was retrieved from gnomAD and was analyzed to find novel pathogenic missense and splicing variants of paraplegin protein. The allelic frequency filter was applied and gnomAD file was then uploaded in the CADD server. After CADD analysis, missense variants and splicing variants were sorted out by applying filters. The results for missense variants were further confirmed with several renowned bioinformatic tools used for missense variant analysis (SNP&GO, PHD-SNPg, PredictSNP2, UMD-Predictor and PROVEAN). After a detailed analysis of missense variants aimed at analyzing their pathogenicity, a total of 9 mutations were finally obtained which were predicted to have highly pathogenic effects by all in silico tools used in the analysis. The mutations of SPG7 were analyzed using NM_003119.4.
In this work, to analyze the effect of missense mutations on the stability of protein related to HSP through mitochondrial dysfunction, I-Stable tool was used. Clashes / contacts were further analyzed in Chimera to see the type of interaction between the substituted residue (after mutation) and the neighboring residues ( Figure 3).
To analyze the impact of mutation on the splicing region of these mitochondrial genes, bioinformatic tools namely Spliceman, Human Splice Finder and SPICE were used. After thorough analysis, 25 mutations were identified in case of SPG7 through CADD analysis. All these mutations have the potential to produce a deleterious effect in case of the autosomal recessive form of HSP through mitochondrial dysfunction.
There are various empirical techniques used to analyze the distribution of all twenty amino acids in different conformation for the formation of the three-dimensional structure of a specific protein. By devising a scheme, the probability of secondary structures of protein to adopt a general shape can be evaluated. Amino acids can be either hydrophilic and hydrophobic based on their particular chemical characteristics, for example, alanine (A), valine (V), leucine (L), proline (P), phenylalanine (F) and cysteine (Cys) are considered to be hydrophobic, while lysine and arginine are hydrophilic in nature, both Department of Life Sciences Volume 2 Issue 2, 2020 consisting of positive charge at the neutral pH [25]. SOAP is an online computer program which is written in Clanguage and it allocates a proper hydropathy value to every residue in a specific amino acid sequence. According to the scale, as the value increases, the more hydrophobic becomes the residue [26].
Pathogenic mutations found in the splicing region have the potential to disrupt the normal splicing process of the protein. They may be present either in the exons or the introns. The mutations either break or add a new splicing site or may turn on the cryptic ones, which ultimately affects the protein. They may further bind with the enhancers or silencers involved in the splicing process and also bring alteration in the structure of mRNA [27]. In this study, 23 splicing mutations were predicted to have a pathogenic effect on the normal splicing process of the paraplegin protein. Out of these 23, 18 mutations have not been reported in any previous work for their associated pathogenicity in causing HSP.
In this study, after an elaborated series of analysis, each variant was further analyzed in ClinVar to check the novelty of the work. This study found a total of 14 missense mutations and 18 splicing mutations which were considered to be highly pathogenic. Their pathogenicity was not reported previously in any kind of scientific work. Thus, they are considered novel.

Conclusion
HSP, a neurodegenerative disease, can be characterized with a number of disorders in which lower limb complexity associated with progressive spasticity is the prefatory symptom. The proteins associated with mitochondrial dysfunction are directly involved in causing HSP. In this study, mutation analysis of paraplegin protein (SPG7) was carried out through bioinformatic tools. Different bioinformatic tools were utilized to analyze the effect of missense and splicing mutations on the structure and stability of protein. The results showed that 14 missense mutations and 18 splicing mutations have the potential to cause the autosomal recessive form of HSP and its associated complications. The mutations reported in this work can be further analyzed through in vitro analysis.