Overview of My Work on Protein Folding

In collaboration with my chemist colleague, Prof. Sharon Huo, my recent research on protein folding focuses on developing and applying various graph-based approaches for the study of protein conformations and folding pathways. We have achieved important improvements over earlier methods and obtained insightful results for the proteins under study.

One aspect of our work on algorithms is built upon the probabilistic roadmap methods (PRM), which were originally developed for robot motion planning problems and have been very successful in tackling many difficult problems including some in protein folding. The general PRM methodology is to construct a roadmap to capture important features of protein conformation space and energy landscape. In turn, rich thermodynamic and kinetic information can be extracted from the roadmap and further analyzed by graph-based tools. The construction of a roadmap involves the generation of nodes for conformations and edges for direct transitions between neighboring conformations.  The following figure is a schematic representation of the PRM approach applied to the Muller potential, showing the roadmap nodes (as black dots), edges (as blue lines) and the shortest path (in red) between two potential energy minima.

We have developed our MaxFlux-PRM method to incorporate the maximum flux principle into the PRM framework and adopt several innovative techniques to address the conformation space sampling, roadmap connection, and numerical precision issues, which are important for the computational efficiency and the quality of roadmaps and folding pathways. The shortest path as found in our roadmap maximizes an estimate of the reactive flux, or equivalently minimizes the mean first passage time (MFPT), at a given temperature. The computation results in our preliminary studies are very promising and in good consistency with theoretical analysis and experimental evidence.

The following figure illustrates information of the folding path of engrailed homeodomain (EnHD) (PDB entry: 1ENH (25)) found by our algorithm:  (a)-- effective energy (potential energy plus solvation free energy), radius of gyration (Rg), and the Ca RMSD with respect to the native state as a function of the folding steps, and (b)--representative conformations (a–f) along the folding path with (d) showing the superposition of the conformations from step 29 to 32. H1-H3 denotes the three helices.

Another aspect of our work is on applying graph algorithms in the analysis and information extraction of computation data, generated through PRM methods and/or other ones.   For example, we recently used the Max-Flow formulation and algorithm to study the transitions encoded in molecular dynamics (MD) simulation data of b-hairpin and human transthyretin peptide (TTR (105-115)) dimer.  The central data structure in this work is the minimum cut tree, also called TRansition Disconnectivity Graph (TRDG), which represents the minimum cut  values among all node pairs of a graph (e.g. based on MD data in our case).  The TRDG of TTR peptide dimer is shown below. The free energy surface indicates, among other things, that both parallel and anti-parallel aggregates are populated. Further, parallel aggregates with different alignment patterns are separated by non-negligible free energy barriers.

The following figure shows some example pathways of the TTR peptide that connect the antiparallel and parallel aggregates. The paths are numbered from I to III, and the conformations along the path are numbered from 1 to 9.

Part of our ongoing research is on the development and adaptation of various graph generation and analysis methods that explore the rich information encoded in the computation data of protein studies.