Biology has swiftly become a data-centered science, driven by technological advances in data acquisition and research in genomics and, by extension, in all omics domains. One of these, proteomics, is the target of this project proposal, in which we plan to analyze, using mostly machine learning (ML) methods, a specific type of cell-membrane proteins of biological relevance, namely the G Protein-Coupled Receptors (GPCRs). Over the last few decades, over 50% of drugs have targeted only four key protein families, from which almost a 30% correspond to GPCRs. These receptors have become an important target for drugs in new therapies, particularly in areas such as pain, anxiety, and neurodegenerative disorders. Therefore, the proposal focuses in the area of pharmaco-proteomics. The functional properties of proteins depend on their tertiary structure (3-D configuration). The progress on its discovery through crystallography has been slow. In our previous research, we have analyzed, using ML methods, a sub-family of GPCRs with no known full 3-D configuration from their primary amino acid sequences. In this proposal, we veer from this approach towards the state-of-the-art in the field, which is the analysis of GPCRs from their Molecular Dynamics (MD). This is a computational simulation approach that allows exploring the conformational space of GPCRs by generating large scale synthetic time series of protein conformations in which the raw data correspond to spatial positions of the protein components. Ligands are molecules that bind, in different ways, to the GPCR. One of the reasons pharmacology is interested in GPCRs is that the richness of their ligand space makes the molecular space for drug design more extensive. Types of ligands related to different pharmacological effects include agonists, neutral antagonists and inverse agonists. Instead of dealing with 3-D snapshots of the receptors, MD simulation allows exploring their dynamic behavior and, therefore, their different molecular conformations, which determine their ultimate physiological response, induced by different ligands. This poses a problem of knowledge extraction from large scale data that is ideally suited to ML-based approaches. This type of approaches is still extremely incipient in this domain and our project aims to help filling such gap investigating several aspects of the problem that include: the analysis of different strategies for the transformation of raw MD data into representations suitable for advanced ML-based methods; the analysis of the transformed MD data, both including and excluding the time domain, using deep learning; the adaptation of the ML-based approaches to the identification of specific receptor motifs relevant as determinants of receptor function/pharmacological effect; the visual exploration of conformational dynamics using advanced latent representation models for multivariate time series; and the definition of methods to increase the interpretability and explainability of the results obtained. Our proposal aligns with the European Horizon 2020 research Societal Challenge 1: Health, Demographic Change and Wellbeing of which one of our international team members was Chair of the Expert Advisory Group in the period 2016-18. As a result, it also aligns with the Spanish corresponding challenge. Its basic and applied research is expected to have an impact as a precursor of findings in the area of pharmaco-proteomics that may assist drug design.
PLAN ESTATAL DE INVESTIGACIÓN CIENTÍFICA Y TÉCNICA Y DE INNOVACIÓN 2017-2020
PROGRAMA ESTATAL DE I+D+I ORIENTADA A LOS RETOS DE LA SOCIEDAD