Faryal Sadiq*
Department of Life Sciences, University of Management & Technology, Lahore, Pakistan
Retinitis Pigmentosa (RP) is a term used to describe a group of eye disorders related to retina damage. Due to the formation of bone spicules inside the retina of the affected individuals, patients suffer from poor eye vision or complete blindness, rarely. The reason of the disease could be genetic defects or environmental factors, for instance comorbidity, light exposure, and ethnicity. Tubby-like protein (TULP1) is expressed in the retina of the eye, specifically inside rod and cone cells, and mutations inside the gene can cause changes in structure and function of photoreceptor cells. The current study may provide new insights to understand the genetic variations found in TULP1 gene from the genomic to proteomic level, thus predicting the highly pathogenic variations causing RP. Different bioinformatic tools employing different algorithms were used to score the pathogenic variants, hence cross validating the results. Twenty (20) pathogenic missense variants which can destroy the protein structure as discussed in the study, thirteen (13) splice site variants, and lastly, nine (9) frameshift, seven (7) stop-gained variants were concluded as highly pathogenic for the candidate gene.
Eye is the main organ associated with vision and is a major part of the sensory nervous system. Poor eye vision or blindness can directly affect the quality of life. Several retinal damaging conditions including glaucoma, retinal vascular disease, and macular degeneration associated with aging, are known to affect approximately 60 million people worldwide. Some eye disorders occur due to certain environmental factors, for instance aging and poor health conditions. While, some are heredity traits where RP is among the most prominent. Approximately, one hundred (100) genes have been observed to cause hereditary RP. RP is a vast terminology used to group eye disorders with retinal damages leading to poor vision and in rare cases complete night blindness [1]. The prevalence ratio of RP is 1/3500 around the world [2].
There are two types of RP stated as non-syndromic RP and syndromic RP. Non-syndromic RP is related to poor vision and has been found to affect 70-80% of the people and is a more common type of RP. The incidence rate for non-syndromic RP is 1/5000, while carrier ratio observed is 1/1000. The other type of RP is syndromic RP where the retinal damage combines with some other systemic dysfunction. Usher syndrome is the typical example of syndromic RP in which vision loss conjugates with hearing disabilities [3].
The risk factor to transfer the hereditary disorders including hereditary RP as well as spastic paraplegia is high in South Asian region. As reported by statistical data, the autosomal recessive RP with the number of 60 genes is known to be involved [4].
TULP1 gene is an important member of TULP (Tubby-like protein) family and is found in retinal epithelial cells. TULP1, as a transmembrane protein, mainly functions in protein trafficking and has the tendency to bind with lipids forming phosphatidylserine, phosphatidylinositol, 3,4,5-bisphosphate, phosphatidic acid, 4,5-bisphosphate, 3,4-bisphosphate, phosphatidylinositol 5-phosphate, phosphatidylinositol 4-phosphate, and phosphatidylinositol 3-phosphate. TULP1 also functions in the cleaning of dead cells inside the retina and is involved in cell apoptosis/phagocytosis [5].
TULP1 gene with genomic size 14997 base pairs (15 exome count) is located on human chromosome 6, on the antisense strand of DNA (Deoxyribonucleic Acid) double helix. TULP1 gene encodes protein with 542 amino acids (Figure 1).
Figure 1. Pathogenic variants observed from cDNA (SNV) to protein. (An illustration of TULP1 gene (containing 15 exons, locates at 6p21.32) to protein translation (encoding 542 amino acid residues, region ranges between 298-542 covers a conserved C-Terminal domain and region between 1-289 linked with intrinsic disorder))
In the analysis of the current study, results were crossmatched using different tools and databases with different algorithms. The importance of algorithm designing was reflected upon for any biologist or healthcare expert. Not all the tools or software are 100% error free, however, still can be a helping hand for the scientists to make authentic predictions. Presently, scientists prefer dry lab practices before proceeding to wet lab experiments. This is because it is a more helpful approach in a limited time span for the scientists to predict the structure of any biological molecule even if it has not been discovered yet via using these tools or databases [6].
TULP1 genomic variants were retrieved through gnomAD. Missense variants, splice site variants, frameshift, and stop-gained variants were analyzed for their pathogenicity grading using multiple tools. Protein functioning was observed through visualization tool. Streamline followed for the insilico approach is shown in Figure 2.
Figure 2. Flowchart to carry out TULP1 Gene Variants Analysis
To study the effect of SNPs reported for TULP1 gene, these were retrieved from gnomAD gnomAD - Broad Institute, containing millions of SNVs (Single Nucleotide Variants) data available for human genome in both genome assemblies (GRCh37 and GRCh38) [7]. The number of variants obtained for TULP1 is shown in Figure 3.
Figure 3. Distribution of SNPs in TULP1 Gene
TULP1 missense variants were scored with a very low allelic frequency. Combined Annotation Dependent Depletion (CADD) tool was designed on machine learning (ML) algorithm combining more than 63 annotations. Variants with CADD C-Score range between 10-20 are considered highly pathogenic [8]. Variants with CADD C-score ≥ 10 were further analyzed using multiple tools, for instance CAPICE [9], REVEL, PMut, UMD-Predictor, and Predict SNP2.0 Mutation Assessor. Quality control of missense variants was verified using different parameters which included checking the protein stability using stability tools, (CUPSAT, iStable [10], DynaMut [11], MAESTRO, YASARA [12]) and Clashes finding inside the protein structure as well as PTM sites analysis through UCSF Chimera [13]. To observe the candidate gene functioning caused by variations, conservation ratio and secondary structures were also observed via ConSurf and NetSurfP-2.0 [14].
Scoring of TULP1 splice site variants was performed to screen out the highly pathogenic variants for disrupting the splicing mechanism. SPiCEv2.15 [15], Spliceman, Mutation T@ster [16] tools were used for variant analysis.
Frameshift and nonsense variants scoring was performed using CADD.
Through gnomAD, total number of 324 missense variants for candidate gene (TULP1) (allelic frequency <0.001 and allelic count < 50) were obtained. CADD scored 274 variants as pathogenic variants with C score ≥ 10. Most of the tools set a threshold value of 0.5, while variants score above 0.5 are considered pathogenic. To filter out the highly pathogenic variants, multiple tools were used, that is, CAPICE [9], REVEL, PMut, UMD-Predictor [17], Predict-SNP2.0, and Mutation Assessor. Lastly, only 20 missense variants with CAPICE score ≥ 0.5 were filtered out as "highly pathogenic" (Table 1). These twenty missense variants were further analyzed for candidate protein functioning. Graphical representation of CADD and CAPICE scoring of missense variants can be seen in Figure 4 (a).
Figure 4. Graphical Representation of pathogenic TULP1 genomic variants scoring (a) highly pathogenic missense variants using (CADD & CAPICE scoring). (b) protein Stability tools scoring vs missense variants (c) Splice site variants through SPiCE (Considering the Red line as threshold and dots above the threshold are variants affecting splicing while dots below threshold are less deleterious variants for splicing disturbance) (d) Pathogenic frameshift variants scoring by CADD (e) Stopgained variants by CADD
Table 1. TULP1 Pathogenic Missense Pathogenic Variants Scoring Obtained using Different Tools
Nucleotide Change |
Protein Change |
CADD C Score |
CAPICE Score |
REVEL Score |
PMut Predicted Score |
UMD-Predictor Prediction |
Predict-SNP2.0 Prediction |
Mutation Assessor Prediction |
6:35466133T>C |
p.Ser534Gly |
33 |
Deleterious 0.775 |
Deleterious 0.917 |
Deleterious 0.91 |
Pathogenic 0.84 |
Pathogenic 1 |
High 3.865 |
6:35466201T>C |
p.Glu511Gly |
34 |
Deleterious 0.836 |
Deleterious 0.71 |
Deleterious 0.68 |
Pathogenic 0.84 |
Pathogenic 1 |
High 3.02 |
6:35466229T>A |
p.Ile502Phe |
32 |
Deleterious 0.722 |
Deleterious 0.914 |
Deleterious 0.79 |
Pathogenic 0.78 |
Pathogenic 0.076 |
High 3.06 |
6:35466231T>A |
p.Tyr501Phe |
24.9 |
Deleterious 0.825 |
Deleterious 0.858 |
Deleterious 0.87 |
Probably Pathogenic 0.69 |
Pathogenic 0.076 |
Medium 2.945 |
6:35467808C>T |
p.Arg482Gln* |
33 |
Deleterious 0.924 |
Deleterious 0.942 |
Deleterious 0.92 |
Pathogenic 0.84 |
Pathogenic 1 |
High 3.955 |
6:35467809G>A |
p.Arg482Trp* |
28.6 |
Deleterious 0.817 |
Deleterious 0.966 |
Deleterious 0.92 |
Pathogenic 0.93 |
Pathogenic 0.122 |
High 3.955 |
6:35467820T>G |
p.Asn478Thr |
27 |
Deleterious 0.748 |
Deleterious 0.952 |
Deleterious 0.87 |
Pathogenic 0.81 |
Pathogenic 1 |
High 3.905 |
6:35467820T>C |
p.Asn478Ser |
26 |
Deleterious 0.764 |
Deleterious 0.949 |
Deleterious 0.92 |
Pathogenic 0.84 |
Pathogenic 1 |
High 3.905 |
6:35467863T>G |
p.Lys464Gln |
26.7 |
Deleterious 0.751 |
Deleterious 0.93 |
Deleterious 0.92 |
Pathogenic 0.84 |
Pathogenic 1 |
High 3.345 |
6:35467923C>T |
p.Asp444Asn* |
28.6 |
Deleterious 0.716 |
Deleterious 0.541 |
Neutral 0.44 |
Probably Pathogenic 0.69 |
Pathogenic 1 |
Medium 2.945 |
6:35471398T>C |
p.Met421Val |
25.5 |
Deleterious 0.707 |
Deleterious 0.949 |
Deleterious 0.9 |
Probably Pathogenic 0.72 |
Pathogenic 1 |
High 3.935 |
6:35471539C>T |
p.Arg400Gln* |
31 |
Deleterious 0.951 |
Deleterious 0.964 |
Deleterious 0.8 |
Probably Pathogenic 0.72 |
Pathogenic 1 |
High 3.455 |
6:35473518C>T |
p.Arg371Lys |
34 |
Deleterious 0.745 |
Deleterious 0.897 |
Deleterious 0.71 |
Pathogenic 1 |
Pathogenic 1 |
Medium 2.945 |
6:35473528C>T |
p.Gly368Arg |
31 |
Deleterious 0.847 |
Deleterious 0.965 |
Deleterious 0.91 |
Pathogenic 1 |
Pathogenic 1 |
High 3.89 |
6:35473542C>T |
p.Gly363Glu |
25.8 |
Deleterious 0.843 |
Deleterious 0.948 |
Deleterious 0.79 |
Pathogenic 0.99 |
Pathogenic 1 |
High 3.42 |
6:35473554A>C |
p.Leu359Arg |
26.8 |
Deleterious 0.815 |
Deleterious 0.993 |
Deleterious 0.83 |
Pathogenic 0.93 |
Pathogenic 1 |
High 3.64 |
6:35473572G>A |
p.Ser353Phe |
28.7 |
Deleterious 0.902 |
Deleterious 0.981 |
Deleterious 0.77 |
Pathogenic 0.96 |
Pathogenic 1 |
High 3.205 |
6:35473605C>T |
p.Arg342Gln* |
27.8 |
Deleterious 0.713 |
Deleterious 0.855 |
Deleterious 0.89 |
Pathogenic 0.84 |
Pathogenic 1 |
High 3.32 |
6:35473606G>C |
p.Arg342Gly |
25.8 |
Deleterious 0.778 |
Deleterious 0.724 |
Deleterious 0.89 |
Pathogenic 1 |
Pathogenic 1 |
High 3.665 |
6:35473848G>A |
p.Arg311Trp* |
26.1 |
Deleterious 0.808 |
Deleterious 0.885 |
Deleterious 0.89 |
Pathogenic 0.93 |
Pathogenic 1 |
High 3.12 |
(UMD predictor score predictions were divided by 100 to homogenize the data.)
Mutations with Asterisk (*) sign have already been reported in ClinVar and the rest are novel.
3.3.1. Effect of Missense Variations on Protein Stability. Highly pathogenic missense variants were analyzed by CUPSAT, i-Stable, DynaMut, MAESTRO, and YASARA scoring methods for stability check against each variant. Negative scores indicate the destabilizing effect of mutations on protein structure and vice versa for positive scores. In YASARA, variants with positive scores show decreasing protein energy due to the variant (Table 2).
Graphical representation of stability scores by tools used can be seen in Figure 4(b).
3D structure was modeled for TULP1 protein with missense variations through UCSF Chimera (Figure 5).
Table 2. Stability Score using Stability Tools to Analyze the Effect of Missense Mutations on Protein
Protein Change |
CUPSAT Predicted ΔΔG (kcal/mol) |
iStable Predicted ΔΔG (kcal/mol) |
DynaMut Predicted ΔΔG (kcal/mol) |
MAESTRO Predicted ΔΔG (kcal/mol) |
YASARA Stability Prediction (kcal/mol) |
p.Arg400Gln |
Stabilizing 1.73 |
Decrease -0.640 |
Destabilizing -0.519 |
Increase 1.645 |
Increasing -0.718972 |
p.Arg482Gln |
Destabilizing -0.69 |
Increase 0.458 |
Destabilizing -0.404 |
Increase 1.424 |
Decreasing 2.3409 |
p.Ser353Phe |
Stabilizing 3.73 |
Increase 0.071 |
Stabilizing 1.217 |
Decrease -0.896 |
Increasing -2.1339 |
p.Gly368Arg |
Destabilizing -1.42 |
Decrease -0.639 |
Destabilizing -0.933 |
Decrease -0.393 |
Decreasing 19.9686 |
p.Glu511Gly |
Destabilizing -2.9 |
Decrease -0.535 |
Destabilizing -0.328 |
Increase 0.645 |
Decreasing 1.43369 |
p.Tyr501Phe |
Stabilizing 2.68 |
Decrease -0.230 |
Destabilizing -0.161 |
Increase 0.529 |
Decreasing 0.67915 |
p.Arg482Trp |
Destabilizing -1.56 |
Decrease -0.009 |
Stabilizing 0.051 |
Increase 0.385 |
Decreasing 2.77073 |
p.Arg311Trp |
Stabilizing 0.22 |
Decrease -0.345 |
Stabilizing 0.29 |
Decrease -0.183 |
Decreasing 6.73481 |
p.Arg342Gly |
Stabilizing 1.22 |
Decrease -1.052 |
Destabilizing -0.655 |
Increase 1.054 |
Decreasing 0.735874 |
p.Ser534Gly |
Stabilizing 1.02 |
Decrease -1.283 |
Destabilizing -0.47 |
Increase 1.401 |
Decreasing 1.09835 |
p.Asn478Thr |
Stabilizing 2.95 |
Decrease -0.230 |
Destabilizing -0.125 |
Increase 0.632 |
Decreasing 0.1565 |
p.Lys464Gln |
Stabilizing 1.11 |
Decrease -0.592 |
Destabilizing -0.233 |
Increase 0.911 |
Decreasing 1.09376 |
p.Asn478Ser |
Stabilizing 0.22 |
Decrease -0.054 |
Stabilizing 0.114 |
Increase 0.553 |
Decreasing 0.424982 |
p.Arg371Lys |
Destabilizing -1.62 |
Decrease -0.911 |
Destabilizing -0.721 |
Increase 0.831 |
Decreasing 0.071761 |
p.Ile502Phe |
Destabilizing -0.07 |
Decrease -0.545 |
Stabilizing 1.305 |
Increase 0.510 |
Decreasing 1.53253 |
p.Asp444Asn |
Destabilizing -0.39 |
Increase 0.752 |
Destabilizing -0.176 |
Increase 1.108 |
Decreasing 1.22898 |
p.Arg342Gln |
Stabilizing 0.33 |
Decrease -0.639 |
Destabilizing -0.48 |
Increase 0.810 |
Increasing -0.285053 |
p.Met421Val |
Destabilizing -3.23 |
Decrease -1.213 |
Destabilizing -0.816 |
Increase 1.324 |
Decreasing 1.67459 |
Figure 5. TULP1 3D Structure Visualization by Angle Rotation of 180 Degrees (red area marked the mutated sites)
3.3.2. Clashes Finding due to Missense Variations. Clashes can be simply termed as unnecessary/unwanted contacts due to mutations,thereby changing the confirmation. Clashes were observed using UCSF Chimera.
Clashes were determined in mutations reported; p.Gly368Arg in which no clashes/contacts were found in wild type structure, while for mutated type 46, contacts were found.
p.Glu511Gly: no clashes were found for both wild type and mutant type, p.Tyr501Phe: no clashes were found for wild type. Whereas, 9 contacts including interaction with water molecule were observed for mutant type (Yellow color indicating the wild type of residue and purple color showing the mutant residue whereas red lines showing the contacts found, blue colored balls showing the water molecules) (Figure 6).
Figure 6. Clashes Findings due to Missense Mutations in Candidate Protein Structure using UCSF Chimera
3.3.3. PTM Site Disruption by Missense Variants. PTM sites inside candidate protein were confirmed using PROSITE database.
Phosphorylation sites with different protein kinase were observed and visualized in protein structure using UCSF Chimera. Some of the missense variations were found to disturb some phosphorylation sites, hence hindering the cellular processes (Figure 7).
Figure 7. PTM sites Disruptions Visualization through UCSF Chimera
3.3.4. Protein Conservation and Secondary Structure Prediction. Candidate protein (TULP1) has a single C-Terminal domain. To observe the conservation, ConSurf was used for analysis. ConSurf prediction scoring is done in terms of color grading (1-9) in which 1-4 grades are variables, grade 5 as average, and 6-9 color grade indicates highly conserved amino acid residues inside protein sequence.
NetSurfP-2.0 predicts surface accessibility as RSA (Relative Surface Accessibility), ASA (Absolute Surface Accessibility) and also provides information for secondary structures (coil, helix, strand) present in a protein. For secondary structures (loops or coils) inside TULP1 protein and to observe that amino acids are buried/exposed to the surface in sequence, NetSurfP-2 was used. Table 3, visuals seen in Figure 8.
Table 3. TULP1 Protein Conservation Prediction due to Missense Mutations using ConSurf and Solvent Accessibility, Secondary Structure Prediction by NetSurfP-2.0
ConSurf |
NetSurfP-2.0 |
||||
Protein Change |
Score |
Grade |
RSA |
ASA (Å) |
Sec. Structure/exposed/Buried |
p.Arg400Gln* |
-0.905 |
8 |
51% |
116 |
Coil/Exposed |
p.Arg482Gln* |
-1.199 |
9 |
52% |
119 |
Coil/Exposed |
p.Ser353Phe |
-0.962 |
9 |
4% |
4 |
Strand/Buried |
p.Gly368Arg |
-0.866 |
8 |
3% |
2 |
Strand/Buried |
p.Glu511Gly |
0.225 |
4 |
82% |
144 |
Coil/Exposed |
p.Tyr501Phe |
0.338 |
4 |
34% |
73 |
Coil/Exposed |
p.Arg482Trp* |
-1.199 |
9 |
52% |
119 |
Coil/Exposed |
p.Arg311Trp* |
-1.102 |
9 |
26% |
59 |
Strand/Exposed |
p.Arg342Gly |
-0.577 |
7 |
32% |
73 |
Strand/Exposed |
p.Ser534Gly |
-1.115 |
9 |
10% |
12 |
Helix/Buried |
p.Asn478Thr |
-0.932 |
8 |
39% |
57 |
Strand/Exposed |
p.Lys464Gln |
-0.9 |
8 |
36% |
74 |
Coil/Exposed |
p.Asn478Ser |
-0.932 |
8 |
39% |
57 |
Strand/Exposed |
p.Arg371Lys |
-0.881 |
8 |
30% |
68 |
Strand/Exposed |
p.Ile502Phe |
-0.467 |
7 |
8% |
16 |
Strand/Buried |
p.Asp444Asn* |
0.225 |
4 |
52% |
74 |
Coil/Exposed |
p.Arg342Gln* |
-0.577 |
7 |
32% |
73 |
Strand/Exposed |
p.Met421Val |
-1.063 |
9 |
3% |
6 |
Strand/Buried |
Figure 8. TULP1 Protein Secondary Structure Prediction by NetSurfP-2.0. Surface Accessibility=Red high ground indicating exposed amino acids while sky blue downward high areas showing the buried residue in protein, considering threshold as 25%. Secondary Structure= straight pink line depicting coil, orange spirals are helix, colored arrow (indigo) indicating strand. Disorder= just under the secondary structure prediction line is thick black line showing the chances of disordered residues, the thicker the line, the higher the risk of disordered residues there. (as shown in figure 1)
By gnomAD, 57 splice site variants were obtained for candidate gene (TULP1). Seven (7) variants were canonical splice site variants (splice site donor=4, splice site acceptor=3). The remaining 50 variants were scored using CADD. Thirteen (13) variants with CADD score ≥ 15 were scored by splicing tools (SPiCE v2.1.5, Spliceman, Mutation T@ster) (Table 4 & Figure 4 (c)).
Table 4. TULP1 Pathogenic Splice Site Variants Score
Nucleotide Change |
Transcript Consequence |
CADD C Score |
Mutation T@ster Prediction |
Spliceman Prediction |
SPiCE Prediction |
||||
Score |
L1 distance |
Rank (L1) |
delta_MES |
delta_SSF |
probability |
inter_thrRes |
|||
6:35466243G>T |
c.1496-6C>A* |
17.15 |
(Polymorphism) 0.999 |
32669 |
55% |
-0.16 |
0.014 |
0.06131 |
low |
6:35467757C>T |
c.1495+1G>A* |
35 |
(deleterious) 1 |
33807 |
61% |
-0.905 |
-1 |
1 |
high |
6:35471626C>G |
c.1113-1G>C |
33 |
0 |
36693 |
76% |
0 |
0 |
0.02686 |
low |
6:35477091T>C |
c.719-2A>G |
33 |
(Deleterious) 1 |
36552 |
75% |
-1.139 |
-1 |
1 |
high |
6:35477096G>T |
c.719-7C>A |
21.4 |
(Polymorphism) 0.999 |
34935 |
67% |
0.133 |
0.009 |
0.00688 |
low |
6:35477534G>C |
c.602-7C>G* |
15.05 |
(Polymorphism) 0.999 |
32943 |
57% |
0 |
0 |
0.02686 |
low |
6:35477600T>C |
c.601+4A>G |
17.8 |
(Deleterious) 0.999 |
35062 |
68% |
-0.057 |
-0.12 |
0.68004 |
low |
6:35477602A>C |
c.601+2T>G |
32 |
(Deleterious) 1 |
36446 |
75% |
-1 |
-0.108 |
0.9997 |
high |
6:35478633C>G |
c.499+5G>C |
22.9 |
(Deleterious) 0.538 |
34706 |
66% |
0 |
0 |
0.02686 |
low |
6:35478636A>C |
c.499+1dupG |
26.6 |
(Deleterious) 0.997 |
36014 |
72% |
-1 |
-0.154 |
0.99993 |
high |
6:35479590C>T |
c.191-7G>A |
21.6 |
(Polymorphism) 0.999 |
35880 |
72% |
0.626 |
0.072 |
2.00E-05 |
low |
6:35479952C>T |
c.190+5G>A |
21.1 |
(Polymorphism) 0.999 |
35427 |
69% |
-1.091 |
-0.162 |
0.99998 |
high |
6:35480415C>T |
c.99+1G>A* |
33 |
(Deleterious) 1 |
34481 |
65% |
-0.908 |
-1 |
1 |
high |
Asterisk (*) marked protein substitutions have already been reported in ClinVar, while the rest of mutations are novel.
Nine (9) frameshift variants and seven (7) nonsense variants with the Allelic frequency filter (≤0.002) were retrieved for TULP1 and scored using CADD. Nine (9) frameshift variations were filtered as highly pathogenic (Table 5). While, stop-gained variations were scored high as compared to frameshift variants. (Table 6) Figure 4(d) and Figure 4(e) show the CADD scoring for frameshift and nonsense variants, respectively.
Table 5. Pathogenic Frameshift Variants in TULP1 Gene
Nucleotide Change |
Protein Consequence |
Transcript Consequence |
CADD C Score |
6:35466151AG>A |
p.Phe528SerfsTer55 |
c.1581delC |
30 |
6:35466177CG>C |
p.Arg519GlyfsTer64 |
c.1555delC |
34 |
6:35471403CG>C |
p.Arg419GlyfsTer3 |
c.1255delC |
27.8 |
6:35473799TG>T |
p.His327ThrfsTer44 |
c.979delC |
33 |
6:35476986CT>C |
p.Lys274ArgfsTer36 |
c.821delA |
32 |
6:35477079TTTGG>T |
p.Pro242GlnfsTer16 |
c.725_728delCCAA |
25.4 |
6:35477680T>TG |
p.Pro176ThrfsTer7 |
c.524dupC |
21.1 |
6:35479998TC>T |
p.Glu50AsnfsTer59 |
c.148delG |
22.9 |
6:35480447AG>A |
p.Leu23Ter |
c.67delC |
18.02 |
Table 6. Pathogenic Stop-gained Variants in TULP1 Gene
Nucleotide Change |
Protein Consequence |
Transcript Consequence |
CADD C Score |
6:35466173G>T |
p.Tyr520Ter* |
c.1560C>A |
40 |
6:35471341G>A |
p.Arg440Ter* |
c.1318C>T |
39 |
6:35473549G>A |
p.Arg361Ter |
c.1081C>T |
35 |
6:35473827G>A |
p.Arg318Ter |
c.952C>T |
36 |
6:35478764C>A |
p.Glu125Ter |
c.373G>T |
33 |
6:35479536G>A |
p.Gln80Ter* |
c.238C>T |
32 |
6:35479960C>A |
p.Gly63Ter |
c.187G>T |
34 |
Asterisk (*) marked protein substitutions have already been reported in ClinVar, while the rest of mutations are novel.
TULP1 gene variants are very rare and could be responsible for causing autosomal recessive RP. Protein native structure is important for proper functioning. If any sudden change or mutation occurs inside the amino acid sequence of protein, it directly affects its function. The wild structure type is more stable and mutation inside a protein structure can destabilize it. Stability is important for proteins" proper functioning [18].
TULP1 gene variants (missense, frameshift, nonsense, splice site) were scored according to their degree of pathogenicity using CADD. CADD scores the variants on three different ranks, that is, highly pathogenic, moderate, and lower pathogenic, referring to the main score PHRED (C Score). Variants with C score range between 10-20 are considered highly pathogenic [8]. CAPICE (Consequence Agnostic Pathogenicity Interpretation of Clinical Exome) based on ML algorithm outperformed compared to other tools, due to a low false positive rate [9].
After selecting highly pathogenic missense TULP1 variants, the amino acid substitutions were tested for stability. DynaMut predicts scores in two-step algorithms. The first step is utilizing "Holdout Random Sampler" which calculates the energy change represented as Cumulative distribution function. While in the second step, there comes the authentic stability score using cumulative distribution function which trains the final algorithm-"Neural network model" [11]. istable works on "Support Vector machine algorithm". Due to its uniquely designed algorithm, this tool proved a better option as compared to other tools" output for stability check. Both structural or sequential information of protein can be used as input methods [10] (Figure 4 (b)).
Protein with its native folds is crucial for its proper functioning. Mutations change its structure and, hence function. TULP1 encodes 542 amino acids contained protein structure which was visualized using Chimera. 3D structure with pathogenic missense mutations was modeled [13] (figure 5). Variants predicted as pathogenic for TULP1 were also visualized, disturbing PTM sites by 3D modeling (Figure 7).
TULP1 protein is a transmembrane protein involved in cellular trafficking. Therefore, its nearby interactions, cellular contents, and surface accessibility were observed using NetSurfP-2.0 [14]. In ConSurf, MSA is performed followed by constructing a phylogenetic tree and it also shows the conserved regions inside protein structure with colors [19] (Figure 8).
To find out about variants which can directly affect the splicing mechanisms, splicing variants retrieved through gnomAD were analyzed by Spliceman, SPiCE, and Mutation T@ster.
Spliceman based on "distance matrix calculation methods" provides the output as L1 distance. If position vectors are close enough, it denotes high ranks (high L1 distance). The higher the rank, the higher the chances of disturbing splicing sites [20]. SPiCE (Splicing Prediction in the Consensus Elements) tool designed on ML classifier "Logistic Regression" predicting two scores MaxEntScan(MES) and Splice-Site Finder-Like(SSF-Like) also provides output in a graphical form. In graphical representation (Figure 4c), variants above threshold line are considered as highly pathogenic. While, dots denoting the variants below the threshold line are considered neutral. Geneticists can easily apply this in diagnostics for the analysis of NGS results [20].
Probably, more than 100 genes have been found to be involved in causing RP. In the sub-continent, the custom of consanguineous marriages raises the risk of hereditary autosomal recessive RP. ABCA4, RHO, SPATA7, CRB1, and TULP1 gene families are important in Pakistan and Indian populations. This is because they can be the carrier to cause autosomal recessive RP [21].
Some novel genes, for instance ENSA, DDR1, CCDC188, and DACT2 were found to have an important role in vision cycle and ion channeling in photoreceptor cells. In 2020, a study was conducted to highlight the correlation of these four genes (ENSA, DDR1, CCDC188, DACT2) and the study concluded that the two genes (ENSA, DDR1) co-express and correlate in causing RP [22].
Recently, research was conducted to determine the pathogenic variants in TULP1. The results concluded two heterozygous mutations p. Arg482Trp and p. Leu504fs* as highly pathogenic and can cause autosomal recessive RP [23].
Finding the cure for hereditary disorders has always been a challenging task for the scientists. However, the options which have captured the attention of scientists include, cell therapy, retinal prostheses, and gene therapy. Gene therapy involves removing the genetic errors using viral or non-viral vectors followed by gene silencing pathways. Gene therapy implemented using phase3 voretigene neparvovec vector showed a better efficacy rate against inherited RPE65 mediated retinal dystrophy [24].
In 2020, a finding based on statistical data revealed that probably, 2.7 billion people around the globe (about 36% population) may be the healthy carriers for having at least one of the gene variants which may cause RP in future generations [25]. If scientists consider the cases of visual impairment/inherited blindness for detailed experiments, then this could be the road map in discovering new loci and genes causing RP [26].
RP is generically referred to as a group of heterogeneous eye disorders, where its complicated genetic patterns make it a complex class to study. TULP1 gene is expressed in rod and cone cells of the retina and contributes to 1% autosomal recessive RP. TULP1 gene variants were analyzed through an insilico approach to study the molecular basis of RP and highly pathogenic TULP1 variants were predicted. Predicted pathogenic variants for TULP1 can provide new insights for the scientists to design a treatment line for RP if caused by TULP1 gene variants. Candidate gene variants were cross-checked with different pathogenicity, predicting algorithms using different tools or databases to make predicted pathogenic scoring more authentic and reliable. Predicted pathogenic variants can be confirmed following wet lab experiments from blood sampling to sequencing.
The author of the manuscript has no financial or non-financial conflict of interest in the subject matter or materials discussed in this manuscript.
The data associated with this study will be provided by the corresponding author upon request.
This research did not receive grant from any funding source or agency.