Phages, Phage-plasmids, and Plasmids Sequence Predictions from Metagenome Sequences Using Machine Learning and Deep Learning Algorithms
Uddin, Md Karim (2025)
Uddin, Md Karim
2025
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025052315139
https://urn.fi/URN:NBN:fi:amk-2025052315139
Tiivistelmä
Mobile genetic elements (MGEs) serve as the architects of bacterial evolution and adaptation, with phage-plasmids—a fascinating hybrid class exhibiting dual phage and plasmid characteristics—emerging as particularly significant yet elusive players in antimicrobial resistance spread. This thesis introduces a novel computational framework that surpasses the traditional binary classification of MGEs by employing advanced machine learning techniques to distinguish phages, plasmids, and phage-plasmids from sequence data alone.
Through rigorous analysis of pentamer (k=5) frequency profiles derived from 4,248 carefully curated MGE sequences, I developed and compared three progressively sophisticated models: Logistic Regression, Random Forest, and Convolutional Neural Network (CNN). The CNN achieved remarkable 90% accuracy, revealing the power of deep learning to capture subtle sequence patterns that define these genetic elements. The exceptional precision (93%) for phage-plasmid identification represents a significant advancement in detecting these hybrid elements in complex metagenomic datasets.
Strikingly, my analysis uncovered the distinctive genomic signatures of each MGE class—AT-rich motifs dominating phages, GC-rich patterns characterizing plasmids, and unique sequence compositions marking phage-plasmids. Dimensionality reduction visualizations elegantly confirmed the intermediate evolutionary position of phage-plasmids, while revealing multiple distinct clusters suggesting diverse evolutionary trajectories for these hybrid elements.
Beyond its methodological contributions, this research provides critical biological insights into the sequence-level characteristics that underpin the hybrid functionality of phage-plasmids. The intermediate nucleotide composition and distinctive k-mer patterns observed in phage-plasmids offer computational evidence supporting their proposed role as evolutionary bridges facilitating genetic exchange between different MGE types.
This work creates new possibilities for metagenomic exploration, antimicrobial resistance surveillance, and biotechnological innovation by enabling accurate identification of all three MGE classes without requiring gene annotation or reference databases. By illuminating the genomic nature of these important vehicles of bacterial adaptation, this research advances our fundamental understanding of horizontal gene transfer and provides practical tools to address pressing challenges in infectious disease and microbial ecology.
Through rigorous analysis of pentamer (k=5) frequency profiles derived from 4,248 carefully curated MGE sequences, I developed and compared three progressively sophisticated models: Logistic Regression, Random Forest, and Convolutional Neural Network (CNN). The CNN achieved remarkable 90% accuracy, revealing the power of deep learning to capture subtle sequence patterns that define these genetic elements. The exceptional precision (93%) for phage-plasmid identification represents a significant advancement in detecting these hybrid elements in complex metagenomic datasets.
Strikingly, my analysis uncovered the distinctive genomic signatures of each MGE class—AT-rich motifs dominating phages, GC-rich patterns characterizing plasmids, and unique sequence compositions marking phage-plasmids. Dimensionality reduction visualizations elegantly confirmed the intermediate evolutionary position of phage-plasmids, while revealing multiple distinct clusters suggesting diverse evolutionary trajectories for these hybrid elements.
Beyond its methodological contributions, this research provides critical biological insights into the sequence-level characteristics that underpin the hybrid functionality of phage-plasmids. The intermediate nucleotide composition and distinctive k-mer patterns observed in phage-plasmids offer computational evidence supporting their proposed role as evolutionary bridges facilitating genetic exchange between different MGE types.
This work creates new possibilities for metagenomic exploration, antimicrobial resistance surveillance, and biotechnological innovation by enabling accurate identification of all three MGE classes without requiring gene annotation or reference databases. By illuminating the genomic nature of these important vehicles of bacterial adaptation, this research advances our fundamental understanding of horizontal gene transfer and provides practical tools to address pressing challenges in infectious disease and microbial ecology.