In this research project a specific biological phenomenon, known as “protein folding” is studied, and new advanced statistical methods to infer model parameters will be constructed.
Protein folding is a spontaneous process that transforms a disordered polymer into a specific three-dimensional structure. The correct three-dimensional structure is essential to function, although some parts of functional proteins may remain “unfolded”. In addition to this central role of protein folding in biology, protein folding is also associated with a wide range of human diseases. In Cystic Fibrosis, for example, mutations result in faulty protein folding and hence lack of functional protein. In many neurodegenerative diseases, such as Alzheimers disease, proteins “misfold” into toxic protein structures.
Mathematical modelling offers a simplified, yet manageable, representation of biological reality. Protein folding, same as all biological phenomena, requires its own inherently “noisy” kinetics to be taken into account by a mathematical model, as its dynamical features result in erratic behaviours which appear as random.
Computer-assisted molecular dynamics simulations of protein folding will be performed in this research project and a simplified mathematical representation for such simulations will be developed, using the theory of stochastic differential equations (SDEs).
A most relevant aspect of the project is the development of new statistical tools to estimate unknown quantities that will be represented in our SDE models. Performing statistical inference for model parameters is necessary but also very complicated, as we aim at considering multidimensional models representing dynamics whose exact values cannot be recorded, as these are affected with measurement errors. Such complications require the use of computationally intensive statistical methodology, whose practical benefits are often limited, as we face the need to analyse large datasets while having at our disposal relatively low computational power. We will therefore develop suitable statistical methods overcoming the need to compute exactly those functionals which are at the basis of gold-standard statistical methods (i.e. the likelihood function). We will revert instead to approximate methods, known as "likelihood-free" methods, such as approximate Bayesian inference, synthetic likelihoods, and other particle methods based on sequential Monte Carlo.
Therefore, in summary, the aim of our project is to develop suitable mathematical models for problem folding kinetics and the statistical methods enabling the estimation of unknown features from recorded data.