Modern Computer Science Approaches in Biology: From Predicting Molecular Functions to Modeling Protein Structure


Downloadable Content

open in viewer

Computational machines have become an inseparable part of human lives during the last three decades. One of the crucial enabling technologies of this technological boom is Artificial Intelligence (AI), the field dedicated to simulating human-like behavior in machines. It takes many shapes and forms; however, a particular direction – Machine Learning (ML) – was incredibly impactful in the era of constant data aggregation. The goal of ML is an automated pattern inference and reasoning based solely on the input data. Becoming a household name, machine learning completely revolutionized natural sciences, providing aid to the physicists working on quantum mechanics, helping astronomers filter noisy data, as well as accelerating molecular and cellular discoveries made by chemists and biologists. One of the crucial aspects of everyone’s lives affected by ML technology is the medical care. Perhaps most notable in this area, precision medicine provides the direct opportunity to improve patients’ quality of life directly. The field of precision medicine is dedicated to identifying reasons for different treatment responses from patients and designing the best-suited diagnostics and intervention strategy for each individual. In recent years, the available data pool was expanded by the emergence of high- throughput ‘omics’ experimental technics, making it intractable for conventional manual analysis by a clinician or a biomedical researcher. The omics field emerged in earlier 2000s when next- generation sequencing (NGS) methods that made studying individual genomes possible first emerged. The next big breakthrough happened in 2008, when the second generation of NGS came into play, drastically decreasing the costs of conducting experiments. However, genomics is not the only field that experienced the revolutionary leap. Other quantitative methods that describe molecular processes taking place in the organism advanced rapidly: epigenomics, transcriptomics, proteomics, and metabolomics. Transcriptomics and proteomics are particularly interesting when studying diseases as they are providing a snapshot of the organism’s current state, allowing us to search for the root cause of a particular ailment. Furthermore, transcriptomics provides information on an important regulatory process--alternative splicing (AS). AS increases the versatility of the organism’s molecular arsenal and allows to build more complex systems using the same number of genes. This feat is achieved via combinatorically shuffling selected protein coding parts – exons – from the mRNA molecule prior its transformation into a protein. Thus, AS is a crucial intermediate stage between the gene expression and protein translation. My work focuses on the computational analysis of biological data and encompasses structural genomics, transcriptomics, and proteomics. Individual projects range from elucidating disease etiology and uncovering molecular mechanisms of actions of the alternative splicing to searching for the protein expression-based treatment response biomarkers and studying the potential drug targets on the SARS-CoV-2 viral particle surface. Over the course of these studies I designed a machine learning model that estimates the AS effect on protein-protein interactions; developed a novel quantitative measure that gauges an impact that the alternatively spliced isoforms introduce to the biological system; predicted isoform stability using proteogenomic data and transfer learning; identified response biomarkers for the Gulf War veterans affected by one of the most complex known acquired syndromes for the acupuncture treatment; modeled protein complexes of SARS-CoV-2 virus and simulated its entire envelope in solvent using molecular dynamics methods. This work brings together two important aspects of modern omics studies – transcriptomics and proteomics. It highlights an importance of computational methods development for the modern field of precision medicine.

  • etd-64081
Defense date
  • 2022
Date created
  • 2022-04-27
Resource type
Rights statement


In Collection:



Permanent link to this page: