Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News DeepMind's AlphaFold2 AI Solves 50-Year-Old Biology Challenge

DeepMind's AlphaFold2 AI Solves 50-Year-Old Biology Challenge

This item in japanese

The Protein Structure Prediction Center announced that AlphaFold2, an AI system developed by DeepMind, has solved its Protein Structure Prediction challenge. AlphaFold2 achieved a median score of 92.4 on the Global Distance Test (GDT) metric, above the threshold considered competitive with traditional methods.

The Center made the announcement in a press release describing the results of the 14th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP14). Inspired by biochemist Christian Anfinsen's 1972 Nobel Prize acceptance speech, the challenge is to find computational methods that predict a protein's 3D structure from its amino acid sequence. By achieving a GDT score above 90, which is on par with experimental techniques such as X-ray crystallography and cryo-electron microscopy, AlphaFold2 is considered to have solved the challenge. According to UC Davis researcher and CASP14 co-organizer Andriy Kryshtafovych:

Being able to investigate the shape of proteins quickly and accurately has the potential to revolutionize life sciences. Now that the problem has been largely solved for single proteins, the way is open for the development of new methods for determining the shape of protein complexes – collections of proteins that work together to form much of the machinery of life, and for other applications.

Genetic codes in DNA are "recipes" for creating protein molecules from sequences of amino acids. Although these sequences are linear, the resulting proteins are folded into complex 3D structures which are key to their biological function. Scientists can experimentally determine structure using techniques such as nuclear magnetic resonance, X-ray crystallography, and cryo-electron microscopy. However, these methods require expensive specialized equipment and may take years to complete for a single structure.

In 1972, Anfinsen postulated that a protein's structure should be fully determined by its amino acid sequence. In 1994, CASP was founded as a biennial evaluation of computational models to predict protein structure from sequence. Entrants are given sequences of proteins whose structures have been determined experimentally but have not been published. Prediction results are evaluated using GDT, which measures the similarity between a known structure and a predicted structure on a scale of 0 to 100. A score of 90 or above is considered a success.

AlphaFold2 uses an attention-based neural network that models protein structure as a spatial graph. Besides the raw amino acid sequence, the input to the network includes multiple sequence alignment (MSA) information, which links several different sequences based on the assumption of having a common evolutionary ancestor. For training data, DeepMind used the Protein Data Bank's publicly available dataset of around 170k sequences. Training was run on 16 TPUv3s and took "over a few weeks."

Although the full details of AlphaFold2's architecture have not been released, DeepMind published a paper in Nature describing the previous iteration of AlphaFold, which won first place in CASP13 two years ago with a GDT score of around 60; DeepMind also open-sourced some of that system's code.

On Twitter, biologist Mohammed AlQuraishi, who developed the first end-to-end model for protein structure prediction, described AlphaFold2's results as "astounding." In a detailed blog post, he also praised the system's accuracy but criticized DeepMind's academic communication:

What was CASP14 barely resembled a methods talk. It was exceedingly high-level, heavy on ideas and insinuations but almost entirely devoid of detail. This is a shame and contrasts markedly with DeepMind’s participation in CASP13, when they gave two talks that provided sufficient details for many groups to reproduce their results right away....

In addition to competing in the CASP14 competition, DeepMind used AlphaFold2 to predict structures of several of the proteins from SARS-CoV-2, the virus that causes COVID-19. DeepMind published these results, and later experiments confirmed some of the predictions. The COVID-19 protein structure predictions from DeepMind and others are available on the CASP website.

Rate this Article