Protein Structure Prediction with AlphaFold2, How it Works, Limitations and Solution for Less number of Homotypic and Large number of Heterotypic Contacts

Hassan Kaleem; Muhammad Noman Khalid

doi:10.32350/icr.0201.02

Hassan Kaleem SQL Consultancy LTD 9 Frances Street, Crewe, England
Muhammad Noman Khalid Allama Iqbal Medical College, Lahore https://orcid.org/0000-0002-4970-6798

DOI: https://doi.org/10.32350/icr.0201.02

Keywords: Protein Structure Prediction, Limitations of Alphafold2, Misbalance of homotypic and heterotypic contacts. Homology based Modeling, Ab-Initio Modeling, Feature Extraction of Protein

Abstract

Abstract Views: 124

Knowing the protein structure helps us to investigate diseases in human beings related to abnormal or impaired folded proteins. This research provides a solution for how to identify the misbalance of homotypic and heterotypic contacts on the sequential stage. There are two methods of protein structure prediction, template based and Ab-initio models. Template based model matches the given sequence with the original sequence. Whereas, Ab-initio calculates the weight of the given sequence and identifies whether it is balanced or not. If the sequence is not in balance, it can be labeled as on the initial stage by calculating its weight. In this research, future directions to researchers are provided as how to achieve maximum accuracy in protein structure prediction.

Downloads

Download data is not yet available.

References

“On the Structure of Native, Denatured, and Coagulated Proteins.” https://www.ncbi.nlm.nih.gov/ pmc/articles/PMC1076802/ (accessed Jan. 09, 2022).

“Fees,” X-Ray Crystallography Facility (XRCF). http://xrcf.caltech.edu/xrcf/fees (accessed Jan. 09, 2022).

C. Mirabello and G. Pollastri, “Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relativesolvent accessibility,” Bioinformatics, vol. 29, no. 16, pp. 2056–2058, Aug. 2013, doi: 10.1093/bioinformatics/btt344.

P. Baldi, S. Brunak, P. Frasconi, G. Soda, and G. Pollastri, “Exploiting the past and the future in protein secondary structure prediction,” Bioinformatics, vol. 15, no. 11, pp. 937–946, Nov. 1999, doi: 10.1093/bioinformatics/15.11.937.

S. Wang, J. Ma, and J. Xu, “AUCpreD: proteome-level protein disorder prediction by AUC-maximized deep convolutional neural fields,” Bioinformatics, vol. 32, no. 17, pp. i672–i679, Sep. 2016, doi: 10.1093/bioinformatics/btw446

“AUC: a misleading measure of the performance of predictive distribution models - Lobo - 2008 - Global Ecology and Biogeography - Wiley Online Library.” https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1466-8238.2007.00358.x (accessed Jan. 19, 2022).

R. Heffernan, Y. Yang, K. Paliwal, and Y. Zhou, “Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility,” Bioinformatics, vol. 33, no. 18, pp. 2842–2849, Sep. 2017, doi: 10.1093/bioinformatics/btx218.

S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997, doi: 10.1162/neco.1997.9.8.1735.

M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673–2681, Nov. 1997, doi: 10.1109/78.650093.

C. Fang, Y. Shang, and D. Xu, “MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction,” Proteins Struct. Funct. Bioinforma., vol. 86, no. 5, pp. 592–598, 2018, doi: 10.1002/prot.25487.

M. Torrisi, M. Kaleel, and G. Pollastri, “Porter 5: fast, state-of-the-art ab initio prediction of protein secondary

structure in 3 and 8 classes,” Oct. 2018. doi: 10.1101/289033.

Hanson, Jack, “Protein Structure Prediction by Recurrent and Convolutional Deep Neural Network Architectures,” Nov. 2018, doi: 10.25904/1912/3830.

M. S. Klausen et al., “NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning,” Proteins Struct. Funct. Bioinforma., vol. 87, no. 6, pp. 520–527, 2019, doi: 10.1002/prot.25674.

“DeepMind - What if solving one problem could unlock solutions to thousands more?,” Deepmind. https://deepmind.com/ (accessed Feb. 01, 2022).

A. W. Senior et al., “Improved protein structure prediction using potentials from deep learning,” Nature, vol. 577, no. 7792, Art. no. 7792, Jan. 2020, doi: 10.1038/s41586- 019-1923-7.

“CASP 13.” [Online]. Available: https://www.uniprot.org/unipro t/O75601

“Highly accurate protein structure prediction with AlphaFold | Nature.” https://www.nature.com/article s/s41586-021-03819-2 (accessed Jan. 09, 2022).

“CASP 14.” [Online]. Available: https://www.uniprot.org/unipro t/P31944

R. Pearce and Y. Zhang, “Deep learning techniques have significantly impacted protein structure prediction and protein design,” Curr. Opin. Struct. Biol., vol. 68, pp. 194–207, Jun. 2021, doi: 10.1016/j.sbi.2021.01.007.

“Advances in protein structure prediction and design | Nature Reviews Molecular Cell Biology.” https://www.nature.com/article s/s41580-019-0163-x (accessed Feb. 02, 2022).

D. S. Marks, T. A. Hopf, and C. Sander, “Protein structure prediction from sequence variation,” Nat. Biotechnol., vol. 30, no. 11, Art.no. 11, Nov. 2012, doi: 10.1038/nbt.2419.

N. Qian and T. J. Sejnowski, “Predicting the secondary structure of globular proteins using neural network models,” J. Mol. Biol., vol. 202, no. 4, pp. 865–884, Aug. 1988, doi: 10.1016/0022-2836(88)90564-5.

“Prediction of contact maps with neural networks and correlated mutations | Protein Engineering, Design and Selection | Oxford Academic.” https://academic.oup.com/peds/article/14/11/835/1608425?login=true (accessed Feb. 02, 2022).

S. Wang, S. Sun, Z. Li, R. Zhang, and J. Xu, “Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model,” PLOS Comput. Biol., vol. 13, no. 1, p. e1005324, Jan. 2017, doi: 10.1371/journal.pcbi.1005324.

J. Yang, I. Anishchenko, H. Park, Z. Peng, S. Ovchinnikov, and D. Baker, “Improved protein structure prediction using predicted interresidue orientations,” Proc. Natl. Acad. Sci., vol. 117, no. 3, pp. 1496–1503, Jan. 2020, doi: 10.1073/pnas.1914677117.

Y. Li et al., “Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks,” PLOS Comput. Biol., vol. 17, no. 3, p. e1008865, Mar. 2021, doi: 10.1371/journal.pcbi.1008865.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” 2016, pp. 770–778. Accessed: Feb. 02, 2022. [Online]. Available: https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html

“Identification of direct residue contacts in protein–protein interaction by message passing | PNAS.” https://www.pnas.org/content/106/1/67.short (accessed Feb. 02, 2022).

D. S. Marks et al., “Protein 3D Structure Computed from Evolutionary Sequence Variation,” PLOS ONE, vol. 6, no. 12, p. e28766, Dec. 2011, doi: 10.1371/journal.pone.0028766.

“PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments | Bioinformatics | Oxford Academic.” https://academic.oup.com/bioin formatics/article/28/2/184/1981 08?login=true (accessed Feb. 02, 2022).

“End-to-End Differentiable Learning of Protein Structure - ScienceDirect.” https://www.sciencedirect.com/ science/article/pii/S240547121 9300766 (accessed Feb. 02, 2022).

“Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13) - Senior - 2019 - Proteins: Structure, Function, and Bioinformatics - Wiley Online Library.” https://onlinelibrary.wiley.com/ doi/full/10.1002/prot.25834 (accessed Feb. 02, 2022).

J. Ingraham, A. Riesselman, C. Sander, and D. Marks, “Learning Protein Structure with a Differentiable Simulator,” presented at the International Conference on Learning Representations, Sep. 2018. Accessed: Feb. 02, 2022. [Online]. Available: https://openreview.net/forum?i d=Byg3y3C9Km

J. Li, “Universal Transforming Geometric Network,” ArXiv190800723 Cs Q-Bio, Aug. 2019, Accessed: Feb. 02, 2022. [Online]. Available: http://arxiv.org/abs/1908.00723

J. Xu, M. McPartlon, and J. Li, “Improved protein structure prediction by deep learning irrespective of co-evolution information,” Nat. Mach. Intell., vol. 3, no. 7, Art. no. 7, Jul. 2021, doi: 10.1038/s42256- 021-00348-5.

A. Vaswani et al., “Attention is All you Need,” in Advances in Neural Information Processing Systems, 2017, vol. 30. Accessed: Feb. 02, 2022. [Online]. Available: https://proceedings.neurips.cc/p aper/2017/hash/3f5ee243547de e91fbd053c1c4a845aa- Abstract.html

Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, “CCNet: Criss-CrossAttention for Semantic Segmentation,” 2019, pp. 603–612. Accessed: Feb. 02, 2022. [Online]. Available: https://openaccess.thecvf.com/content_ICCV_2019/html/Huang_CCNet_Criss-Cross_Attention_for_Semantic_Segmentation_ICCV_2019_paper.html

“Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation | SpringerLink.” https://link.springer.com/chapter/10.1007/978-3-030-58548-8_7 (accessed Feb. 02, 2022).

E. C. Alley, G. Khimulya, S. Biswas, M. AlQuraishi, and G. M. Church, “Unified rational protein engineering with sequence-based deep representation learning,” Nat. Methods, vol. 16, no. 12, Art. no. 12, Dec. 2019, doi: 10.1038/s41592-019-0598-1.

M. Heinzinger et al., “Modeling aspects of the language of life through transfer-learning protein sequences,” BMC Bioinformatics, vol. 20, no. 1, p. 723, Dec. 2019, doi: 10.1186/s12859-019-3220-8.

“Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences | PNAS.” https://www.pnas.org/content/118/15/e2016239118.short (accessed Feb. 02, 2022).

“CASP 1.” [Online]. Available: https://www.uniprot.org/uniprot/P31944

“CASP 2.” [Online]. Available: https://www.uniprot.org/uniprot/P31944

“CASP 3.” [Online]. Available: https://www.uniprot.org/uniprot/P31944

“CASP 4.” [Online]. Available: https://www.uniprot.org/uniprot/P31944

“CASP 5.” [Online]. Available: https://www.uniprot.org/uniprot/P51878

“CASP 6.” [Online]. Available: https://www.uniprot.org/uniprot/P55212

“CASP 7.” [Online]. Available: https://www.uniprot.org/unipro t/P55210

“CASP 8.” [Online]. Available: https://www.uniprot.org/unipro t/Q14790

“CASP 9.” [Online]. Available: https://www.uniprot.org/unipro t/P55211

“CASP 10.” [Online]. Available: https://www.uniprot.org/unipro t/Q92851

“CASP 11.” [Online]. Available: https://www.uniprot.org/unipro t/Q91XW7

“CASP 12.” [Online]. Available: https://www.uniprot.org/unipro t/Q6UXS9

L. Floridi and M. Chiriatti, “GPT-3: Its Nature, Scope, Limits, and Consequences,” Minds Mach., vol. 30, no. 4, pp. 681–694, Dec. 2020, doi: 10.1007/s11023-020-09548-1.

S. Gao et al., “Limitations of Transformers on Clinical Text Classification,” IEEE J. Biomed. Health Inform., vol. 25, no. 9, pp. 3596–3607, Sep. 2021, doi: 10.1109/JBHI.2021.3062322.

K. Tunyasuvunakool et al., “Highly accurate protein structure prediction for the human proteome,” Nature, vol. 596, no. 7873, Art. no. 7873, Aug. 2021, doi: 10.1038/s41586-021-03828-1.

wwPDB consortium, “Protein Data Bank: the single global archive for 3D macromolecular structure data,” Nucleic Acids Res., vol. 47, no. D1, pp. D520–D528, Jan. 2019, doi: 10.1093/nar/gky949.

D. Rusciano, D. R. Welch, and M. M. Burger, Eds., “Homotypic and heterotypic cell adhesion in metastasis,” in Laboratory Techniques in Biochemistry and Molecular Biology, vol. 29, Elsevier, 2000, pp. 9–64. doi: 10.1016/S0075- 7535(00)29003-7.

RaoHassanKaleem, RaoHassanKaleem/Diebetes- Detection-using-Machine- Learning-Algorithms. 2022.Accessed: Feb. 14, 2022. [Online]. Available: https://github.com/RaoHassanKaleem/Diebetes-Detection-using-Machine-Learning-Algorithms

“Feature Extraction App - Proteins · Streamlit.” https://share.streamlit.io/raohassankaleem/fetprotextract/main/app.py (accessed Apr. 13, 2022).

“Aminoacids-peptides-primary-structure_0.pdf.” Accessed: Feb. 15, 2022. [Online]. Available: https://biochimia.usmf.md/sites/default/files/inline-files/Aminoacids-peptides-primary-structure_0.pdf

S. J. Malebary, M. S. ur Rehman, and Y. D. Khan, “iCrotoK-PseAAC: Identify lysine crotonylation sites by blending position relative statistical features according to the Chou’s 5-step rule,” PLOS ONE, vol. 14, no. 11, p. e0223993, Nov. 2019, doi: 10.1371/journal.pone.0223993.