Is it possible to use machine learning to make predictions with little understanding about the underlying mechanisms, such as predicting the 3D structure of proteins out of the amino acid sequences?

Updated Nov 5

Predictions are already made mathematically, but they are only guesses.

An example is looking for portions of a protein that might be trans-membrane helices, where there is a numerical calculation of hydrophobic side chains in a certain segment. Basically, if you have hydrophobic side chains in a row for 19 or more residues, then they may compose an alpha-helix facing a lipid membrane.

Another version is looking for helices that are hydrophobic on one side, and hydrophilic on the other side.

There are certain “motifs” which appear in many proteins, and you can mathematically look for them, based on previously solved structures, and numerically scoring each side chain. Then you could guess, “This may be a member of such-and-such category”. Or, “This has a domain that is structured similarly to other known proteins”.

You might also be able to say, “some beta-sheets and beta-barrels look like…”mathematically.

I think you can also look for a feature like an ATPase domain that might be predicted from the sequence.

It gets more and more complicated past that.

These predictions are only very general. There might never be a way to confidently machine-predict individual salt-bridges or metal interactions. Things which are physically close together in the native conformation may be very far apart in the primary sequence.

Predicting is vague guess, that is still a long way from solving the structure.

If you have a solved a protein structure already, then pharmaceutical companies already use computer models to run through possible drug candidates, seeing if they might bind to a receptor or enzyme.

Leave a Reply

Your email address will not be published. Required fields are marked *