Damiano Clementel

      Ritratto di Damiano Clementel

Neuroscience, Technology and Society, XXXVII series 

Grant sponsor

Silvio Tosatto



Project description

Proteins are molecules that play a key role in living organisms. They are responsible for the large majority of biological functions performed in the cell, including biochemical reactions and cell regulation. The function of a protein is strictly related to the disposition of its atoms in the three-dimensional space, also called its native fold. The widely known protein-structure-function paradigm means that a given protein sequence folds into a particular structure, in turn associated with a specific function through the protein folding process. However, predicting the folded structure of a protein starting from information about its composition (protein sequence) is a NP-hard problem, while experimental structural determination techniques are demanding in terms of resources and time. This is in contrast to the extremely easy task of knowing the sequence of a protein. For decades, the prediction of protein structure out of its sequence has been the edge topic in bioinformatics. A recent breakthrough development in the field has been AlphaFold, a sequence-to-structure method developed by DeepMind (Google). Despite its high accuracy, AlphaFold solution works well for proteins which have a fixed three-dimensional conformation. Other proteins (or regions) whose structure changes dynamically through time or upon environmental signals are poorly modeled by AlphaFold. The majority of them are classified as non-globular and comprehend both disordered (IDP), linearly interacting (LIP) and tandem repeat (TRP) proteins. They are generally underrepresented in public databases and dynamic properties are extremely difficult to determine experimentally at atomic level. The first part of my project is to improve and integrate the ecosystem of applications and databases which focus on non-globular proteins in order to increase high quality data available. In the second part of the project, the generated and augmented data will be analyzed and used to enhance prediction models for non-globular proteins and regions. The third objective is to integrate both the outcomes of the data analysis and the upgraded models in the core data resources, making them available to the broader scientific community. The final objective is the creation of a cycle of continuous improvement of the knowledge we have on dynamic biological protein systems.