The work done on the Abalone program.
The problem of protein folding is a very interesting challenge for molecular mechanics. First, proteins are large and complex molecules with a large number of degrees of freedom. There is no hope to cycle through all possible conformations of the protein, even of medium size. It’s interesting that the problem is not only in a lack of computer power. In nature, an exhaustive search is also impossible because of the astronomical number of possible conformations. The situation is called a Levinthal's paradox. But the protein somehow folds. Therefore, protein makes it not by an exhaustive search of all conformations but it finds some special way. It has been suggested that proteins have exactly those sequences that ensure the availability of these special ways for fast folding. This “fast” folding may take a second or more, which is beyond the capabilities of simulations on modern computers. Second, the energy gain of the natural conformation, compared with the “wrong” folded, is very small. This is a very serious challenge to the accuracy of the force field. In general, the problem seems hopelessly complicated. And instead of solving it directly, mimicking the natural process of folding, people use various workarounds. For example, deduce the structure of a new protein by analogy with the known.
It is interesting that direct protein folding using molecular dynamics still possible. Moreover, it can be done using force fields of 80 years of the last century, for example in the OPLS force field using the implicit water model (Generalized Born). Within a few nanoseconds of molecular dynamics at 350 K, a small protein «tryptophan cage» takes a very close to the experimentally determined conformation (http://www.biomolecular-modeling.com/Abalone/Protein-folding.html). Interestingly, in experimental conditions, the protein folds in 4 microseconds. Molecular dynamics makes it a 1000 times faster! The situation seemed hopeless suddenly turns into a very upbeat. How can this be? The first thing that comes to mind is the assumption that the water prevents the rapid protein folding. Indeed, calculations in explicit water in the AMBER demonstrate folding rate close to the experimental one. Secondly, the OPLS force field has been optimized to reproduce thermodynamic properties of fluids. It seems that it gives so good force’s balance between protein residues that this is sufficient to identify the preferred conformation of the protein. Third, Trp-Cage is the alpha-protein. Most of the current force fields favor the formation of alpha but not the beta structure. This alpha preference is well-known for the AMBER94 force field and for the Generalized Born model. Indeed, in the AMBER94 Trp-Cage quickly reaches the native conformation. In the AMBER96, which is optimized for beta-structure, we were unable to fold it.
The simplest approach is to use the preferred beta-conformation force fields such as AMBER96. Given that the Generalized Born model is considered as favorable for the alpha conformation, one can expect that their combination gives a kind of compromise, more or less suitable for both alpha and beta structures. This approach is used, and on this basis was folded several alpha and beta proteins. But it is clear that this is a very rough approach. As already mentioned, we could not be able to obtain the native structure of Trp-Cage. We need force fields sufficiently accurate to describe all the conformation. A more recent version the AMBER99 superior to AMBER96, but it is still mentioned as favorable to the alpha conformation. In order to balance the alpha-beta conformational preference several modifications of this force field were published. In the AMBER99SB version we were able to fold as alpha as well as beta structures under the same conditions. For this we have had to use not a molecular dynamics at constant temperature but the Replica exchange method. Obtaining the Trp-Cage native conformation requires an order longer simulation than it was in the AMBER94. 10-membered polypeptide Chignolin also obtains its native conformation of beta-hairpins in a few tens of nanoseconds.
Build the Chignolin sequence (GYDPETGTWG). Choose an AMBER99SB field of force and Generalized Born water model. Eliminate the overlap of the atoms. Copy the six replicas in the temperature range from 273.15 to 400 K.
5 ns per replica (a total of 30 ns). The calculation takes about a day on a single processor core. The figure shows a typical set of conformations: one or two beta-hairpins at low temperatures and a variety of conformations at higher temperatures. Elements of the alpha helix are rare.
So in principle it’s possible to fold the protein domains using the molecular dynamics method. In the case of small polypeptides, this process can proceed quickly, in just a day on a personal computer. To the success important a good choice of the force field (we recommend AMBER99SB and OPLS), an implicit consideration of water (Generalized Born method) and the protocol allows, on the one hand, sufficient long-term simulations, on the other, providing overcome of the local minima. As such, we use the Replica Exchange in the 273-400 K temperature range. For the smallest proteins six replicas are sufficient. Unfortunately, with increasing size of the protein a greater number of replicas require.