In Silico Analysis of the Variants of Uncertain Significance in AP4S1 Gene

Hereditary spastic paraplegia is a group of heterogeneous neurological disorders with genetic etiologies. It is characterized by spasticity in lower limbs along with neurological complications. Sequencing technologies have identified numerous disease causing variants in AP4S1 gene. However, many very low frequency variations in AP4S1 have the potential to cause hereditary spastic paraplegia in a recessive inheritance manner. This study was designed to identify these potential disease causing variants in AP4S1 gene using in silico tools. These tools predict the effects of deleterious variants on protein function and pre-mRNA splicing. To predict the pathogenicity of missense variants PhD-SNP, PROVEAN, SNPs&GO, and CADD were used. Splice site variants were analyzed using Spliceman, SPiCE, and Human Splice Finder (HSF). In silico analysis identified six missense and five splice site variants with the potential to cause hereditary spastic paraplegia.


Introduction
Hereditary spastic paraplegia (HSP) is a collection of neurological diseases from a clinical and genetic perspective. It impairs human corticospinal tracts and results in progressive lower limb spasticity and weakness [1]. It is known to be genetically inherited by the offspring and the inheritance pattern can be autosomal recessive, autosomal dominant, X-linked recessive and de novo. HSP is categorized into a complicated form (complex) and an uncomplicated form (pure) [2]. Clinical abnormalities associated with it include ataxia, thin corpus callosum, peripheral neuropathy and cognitive dysfunction. The disease is known to be rare across various human populations with a prevalence rate of ~3-9/100,000 [3].
Mutations in more than sixty genes are known to cause HSP [4,5]. A number of pathogenic mechanisms are involved in this disease including intracellular active transport, endoplasmic reticulum shaping, lipid synthesis and metabolism, endolysosomal trafficking pathway, autophagy, myelination, motor based transportation, and mitochondrial function [6,3,7,8]. AP4S1 protein is a part of heterotetrameric AP4 complex and the variants of AP4S1 are known to cause HSP [9, 10]. AP4 complex is expressed in neuronal cells throughout embryological and postnatal development. It plays a key role in protein processing, sorting, trafficking and vesicle formation.
For the identification of missense variants in AP4S1 gene, CADD, PhD-SNP g , PROVEAN, and SNP&GO tools were used. CADD is used for annotating and interpreting genetic variants in human beings. Its major benefit is that it BioScientific Review Volume 2 Issue 1, 2020 objectively provides many annotations for every single variant and transforms them into a single measure PHRED or Cscore. CADD also provides genetic architecture, which is beyond the scope of any currently known single annotation method, while supporting GRch37 and GRch38 genome assembly [11]. PhD-SNP g is a machine learning method based only on sequenced-based features. It is used for the rapid analysis of single nucleotide polymorphism. It is also used as a benchmark for the development of many other tools. This user friendly interface predicts the effect of single nucleotide variants, both in coding and non-coding regions [12]. PROVEAN web server makes predictions about single amino acid and nucleotide substitution for both human and mouse genomes. It has improved running time in the current version. Protein variation effect analyzer uses a preset cutoff value, that is, 2.5 for high balanced accuracy. It gives an opportunity to its users to set their own cutoff value to attain higher sensitivity and specificity of results [13]. SNPs&GO server uses protein functional annotation to make prediction about damaging and deleterious amino acid substitution. SNPs&GO provides 79% accurate results with sequencebased inputs and 83% accurate results using structure-based inputs [14]. For the identification of splice site variants Spliceman, SPiCE and HSF tools were used. Spliceman is an online tool which predicts how likely a distantly present mutation is to affect splice site in a gene [15]. It provides the results of splicing as a ranked list predicting the effect of point mutation on pre-mRNA. Spliceman accepts input with reference and mutated allele having flanking regions known as hexamers. Higher the L1 distance, higher will be chances of disruption of splice site and vice versa [16]. SPiCE is another powerful tool for the prediction of splice site variants. SPiCE uses in silico predictions from Splice Site Finder and Max Ent Scan based on two threshold values 0.115 and 0.749, respectively [17]. These two threshold values are used to classify variants into the following predictive classes: (i) low: variant with no effect on splicing (ii) medium: variant likely to alter splicing and (iii) high: spliceogenic variant [18,19]. Human splice finder (HSF) is a diversified predicting tool designed to check the impact of different types of mutations including missense, non-sense and splice site. HSF identifies splicing motifs in any human sequence and also predicts the effects of variants at splice site. It is updated regularly and has proved to be very helpful in research, diagnostic, and therapeutic purposes as well as for Human Variome Project. It gives detailed predicting information about 5'ss, 3′ss, base pair sequences as well as ESE and ESS in a genome [20].

Variant Identification
ExAC, gnomAD, dbSNPs and Variation Viewer databases were used to enlist the variants of uncertain significance. gnomAD uses v2 (GRCH37/hg19) which contains largely non-overlapping samples. dbSNPs and Variation Viewer databases contain the nucleotide variants reported in human disorders. The focus of this study was to identify missense and splice site variants. So, all other variants in AP4S1 gene were excluded. These variants were filtered on the basis of quality and allelic frequency. An allele frequency <0.002 was set for the inclusion of variants and they were further analyzed through CADD.

Department of Life Sciences
Volume 2 Issue 1, 2020

CADD
For the evaluation of the deleteriousness of SNPs, CADD tool was used. CADD utilizes a wide range of functional categories and is capable of prioritizing functional, deleterious and disease causing variants affecting size and genetic architectures. Variants were uploaded as VCF file and were analyzed following instructions for variant analysis. The results were interpreted from the output file which sorts deleterious variants based on a C-Score [21]. C-Score >20 was set to filter out pathogenic variants.

Protein Stability Analysis
Single amino acid alteration can lead to the loss of protein function. iStable and I-Mutant are protein stability predicting tools. It is necessary to study changes in protein stability via single amino acid substitution that leads to the loss of protein function. The difference in folding between wild type and mutant free energy change is the major contributor towards protein instability. Here, computational tools assist researchers to find the change in protein stability caused by single amino acid alteration without conducting any experimental studies. i-Stable and I-Mutant predict such change using information sequence and structural information regarding protein [22]. iStable combines I-Mutant, MUpro, i-Stable tools and interprets results in the form of increase or decrease in stability [23]. Only those variants were selected which showed a decrease in protein stability.

Splice Site Variants
Splice site variants for AP4S1 were identified and analyzed using Spliceman, HSF, and SPiCE [24].

Missense Variant Analysis for AP4S1 Gene
AP4S1 variants were obtained from ExAC, gnomAD and dbSNPs and a total of 374 missense variants were selected for computational analysis. After applying the filter of allelic frequency <0.002, 12 variants were excluded and the remaining 362 variants were analyzed with in silico tools. Variants were annotated with CADD and it selected the variants with missense consequence and C-Score >15. It resulted in the exclusion of 262 variants and the remaining 100 variants were further analyzed and filtered on the basis of PhD-SNP g score >0.5, PROVEAN score <-2.5, and SNPGO score >0.5. As a result, the six most pathogenic missense variants were obtained. Details of these six pathogenic variants are provided in Table 1. I-Mutant and i-Stable tools are used to study the effects of these missense variants on protein stability (Table 3).

Splice Site Variants Analysis of AP4S1 Gene
A total of 420 splice site variants of AP4S1 gene were identified through ExAC, gnomAD and dbSNPs. Variants with allele frequency <0.002 were excluded and that left us with 408 variants. CADD was used to analyze these remaining variants. Among these variants, 471 showed a C-Score above 20 and only 5 were present in the canonical splice site. The effect of impaired splicing due to these variants was further analyzed using Spliceman, HSF and SPiCE. The results of these variants are depicted in Table 2.

Discussion
Bioinformatics plays a vital role not only in the analysis of modern sequencing data but also in data interpretation by providing meaningful results. Currently, in silico approaches are used for predicting whether a given SNP is associated with a particular disorder or not, while considering different parameters such as clinical, demographic, structural, and bioinformatic during the analysis of the results. [25]. In silico studies focus on the functional approach to identify deleterious mutations. Special concern should be /given to single nucleotide polymorphism that occurs in coding and non-coding regions causing alteration in RNA which should be verified through functional studies. In this study, in silico was utilized to determine the deleterious effects of missense and splice site variations with uncertain significance in AP4S1 gene.
Adaptor proteins including adaptor protein-4 are believed to be new players in endosomal trafficking. The role of AP1-3 complexes was studied in detail. AP-4 complex plays a specific role in endosome membrane trafficking (from trans Golgi network to endosome) of particular proteins, such as amyloid precursor protein (APP) has a role in basolateral sorting in polarized cells as well as in vesicle formation and the selective protein's uptake into the vesicles [9]. Six missense and five splice site variants in AP4S1 (NM_007077) were reported in this study. Two splice site variants found in the study have been reported previously in ClinVar [26] database and cause HSP. It shows the authenticity of the methodology followed in this study. Three more splicing variants in AP4S1 gene were reported in the current study as novel variants having no record in ClinVar database.
In our research, we identified nonsynonymous and splicing variants in AP4S1 that can be deleterious in a homozygous form and cause HSP. For the selection of variants, a stringent criterion was adopted for each predictive tool and highly deleterious missense and splice site variants of AP4S1 gene were obtained. To check the impact of missense variations on protein stability, iStable and I-Mutant tools were used.

Conclusion
In this study, variants caused by amino acid substitution and canonical nucleotide changeswere analyzed using authentic computational tools. All identified variants have the potential to cause an autosomal recessive form of hereditary spastic paraplegia. Finally, these mutations were compared with data in ClinVar database to check whether these mutations have been reported previously or not. In vivo analysis can further confirm the authenticity of this study. All reported variants are rare with allelic frequency <0.002 in gnomAD and a C-Score greater than 20, indicating these variants as deleterious. The current study highlights the importance of computational modelling to identify deleterious variants in recessive disorders.