ENHANCING PROTEIN STRUCTURE AND FUNCTION PREDICTION THROUGH DEEP MULTIPLE SEQUENCE ALIGNMENTS

Main Article Content

Dr. Sandeep Kulkarni
Parmeshwari Aland
Ravindra D Patil
Priya Bonte
Ranjana Singh

Keywords

Deep learning, DeepMSA, Deep Belief Networks, AlphaFold Protein Structure.

Abstract

This paper provides an overview of deep learning algorithms and discusses its potential future advancements. DeepMSA represents a transformative approach to constructing multiple sequence alignments (MSAs) by integrating deep learning techniques with iterative database searches. Leveraging extensive genomic and metagenomic datasets, DeepMSA refines traditional MSA methodologies, significantly enhancing alignment quality for remote homology and complex protein structures. The framework utilizes pre-trained sequence embeddings and neural network-based optimization to improve contact prediction, secondary structure inference, and fold recognition. Comparative benchmarks, such as CASP competitions, demonstrate DeepMSA's superiority over traditional methods like PSI-BLAST and Hblits, with improved SP scores and better tertiary structure modeling. The introduction of DeepMSA2 further advances this methodology by incorporating diverse databases (e.g., Uniclust30, MGnify) and hybrid MSAs for multimer proteins, achieving state-of-the-art performance in predicting both monomeric and complex structures. These results highlight DeepMSA's pivotal role in bridging MSA construction and downstream applications in computational biology, offering a robust platform for protein structure prediction, evolutionary studies, and functional annotation.

Abstract 174 | pdf Downloads 59

References

1. Chengxin Zhang et al. (2020): This foundational study on DeepMSA outlines its methodology, which combines iterative database searching and neural network refinement. It highlights significant improvements in contact prediction and fold recognition for distant-homology proteins. (Bioinformatics, 36:2105-2112)
2. Wei Zheng et al. (2024): DeepMSA2, an updated version, leverages massive genomic and
3. Yang Zhang Lab (2020): The introduction of DeepMSA1 focused on single-chain proteins, showcasing its improvement over traditional MSA methods like PSI-BLAST and HHblits for threading and secondary structure prediction
4. cpxDeepMSA (2022): A cascade algorithm for constructing MSAs tailored for protein complexes. It highlights improved coevolutionary predictions,
5. ViralMSA (2020): Although designed for viral genomes, this tool showcases scalable MSA methods that influence how genomic data is aligned and refined. This project intersects with the goals of DeepMSA in handling large datasets. (Bioinformatics, August 2020)
6. DeepMSA2 Database Applications: Demonstrates how incorporating extensive databases like Uniclust30 and MGnify allows for broader and more accurate MSA generation. This aligns with trends in deep learning-based evolutionary analysis metagenomic databases. It outperformed other tools in CASP15 for protein tertiary and quaternary structure predictions, demonstrating its role in advanced protein modeling. (Nature Methods, 2024)
7. Priyanka Lokhande, Geeta Bhapkar, Dr. Sandeep Kulkarni(2024) A Smart Approach to Content Compression. The Creation of a Text Summarizer Website,International Journal of Innovative Research in Computer and Communication Engineering(IJIRCCE),12(12),https://doi.org/10.15680/IJIRCCE.2024.1212036
8. Ishika Bhargava, Yashwant Rao, Sagar Jagtap, Shekhar Ladkat, Dr.Sandeep Kulkarni. Secure Cloud: An Encrypted Cloud Storage Solution for Enhanced Data Security. International Journal of Innovative Research in Computer and Communication Engineering(IJIRCCE),12(4),https://10.15680/IJIRCCE.2024.1204150
9. Shivani Joshi, Gori Khandelwal, Yash Barai, Prof. Sandeep Kulkarni, Online Payment Fraud Detection, International Journal of Innovative Research in Computer and Communication Engineering(IJIRCCE),12(12),https://10.15680/IJIRCCE.2024.1212030
10. Prathamesh Maske, Pratik Jadhav, Mohammad Moazzam, Dr.Sandeep Kulkarni, Personalized Course Recommendation System, International Journal of Innovative Research in Computer and Communication Engineering(IJIRCCE),12(12),https://10.15680/IJIRCCE.2024.1212029
11. Samantha Petti et al. (2022): Explores differentiable dynamic programming methods to optimize MSAs for better contact prediction and protein structure outcomes when paired with models like AlphaFold. (Bioinformatics, November 2022). essential for understanding protein-protein interactions. (International Journal of Molecular Sciences, 23:2022)
12. BetaAlign: Applies natural language processing (NLP) techniques to create alignments and refine MSA predictions, paving the way for innovations beyond DeepMSA. This study focuses on improving both efficiency and accuracy. (BioRxiv, 2020)
13. Improvement in Structural Predictions: Studies indicate DeepMSA's role in enhancing structure prediction tasks by integrating advanced MSA profiles into pipelines like AlphaFold and RoseTTAFold, providing better tertiary structural insights
14. DeepMSA’s Role in CASP Competitions: Its usage in CASP benchmarks has consistently shown it to improve prediction metrics such as TM-score and GDT-TS when compared to its competitors