AlphaFold is a computer program that accurately predicts the three–dimensional shapes of proteins. It was developed by Google DeepMind, in collaboration with Oxford University and the European Molecular Biology Laboratory. Proteins are essential biomolecules that perform a vast array of functions in all living organisms and their three–dimensional structure, or conformation (the protein structure), is critical to their function. However, predicting protein structure from sequence is one of the most challenging problems in biology. AlphaFold represents a major advance in protein structure prediction, and will help accelerate progress in many areas of basic research and drug discovery.
Alphafold AI Predicts the Shape of Nearly Every Organism with Protein Sequence Data
Prefer to listen?
If you prefer to listen to, instead of reading the text on this page, all you need to do is to put your device sound on, hit the play button on the left, sit back, relax and leave everything else to us.
DeepMind’s AlphaFold tool has determined the structures of around 200 million proteins. Knowing the 3D structure of almost every protein known to science will be as easy as doing a Google search from now on.
Researchers have utilised AlphaFold, a groundbreaking Artificial Intelligence (AI) network, to predict the structures of more than 200 million proteins from around 1 million species, representing almost every known protein on Earth.
The data dump is freely-accessible on a database created by DeepMind, the Google-owned London-based AI company that developed AlphaFold, and the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL–EBI), a non-profit organisation located near Cambridge, UK.
What's Next for AlphaFold and the Revolution in AI Protein-Folding?
At a press conference, DeepMind CEO Demis Hassabis said, “Essentially, you can think of it as covering the whole protein universe.” The beginning of a new age of digital biology has arrived.
The cellular function of a protein is determined by its three-dimensional form or structure. Most medications are produced based on structural knowledge, and the construction of precise maps of the amino-acid arrangement of proteins is often the first step in learning how proteins function.
DeepMind created the AlphaFold network using a form of AI known as deep learning, and the AlphaFold database was released a year ago with more than 350,000 structure predictions covering nearly every protein produced by humans, mice, and 19 other organisms with extensive research. The catalogue now has around one million entries and is over 23TeraBytes in size.
Christine Orengo, a computational biologist at University College London who has used the AlphaFold database to uncover novel protein families, says, “We’re preparing for the release of this massive treasure.” It’s nice to have all information forecasted for us.
Quality Structures
Since the publication of AlphaFold a year ago, members of the life-sciences community have been ultra-keen on taking a go at the software. The network generates very precise predictions of the structures of several proteins. It also gives information on the accuracy of its forecasts so that academics may determine if they can be relied upon. X-ray crystallography and cryo-electron microscopy are time-consuming and expensive experimental approaches that have been traditionally used by scientists to determine protein structures.
According to EMBL–EBI, around 35 percent of the more than 214 million predictions are considered to be extremely accurate, meaning that they are comparable to empirically established structures. Another 45% are deemed precise enough for several applications.
DeepMind's AI Makes a Huge Advance in Deciphering Protein Structures
Numerous AlphaFold structures are adequate replacements for experimental structures in some applications. In other circumstances, researchers use AlphaFold predictions to evaluate and interpret experimental results. Poor forecasts are often visible, and some of them are caused by inherent disorder in the protein itself, which implies it lacks a definite form — at least when it is not in the presence of other molecules.
Today’s 200 million predictions are based on the sequences in another database called UniProt. A computational biologist at the Josep Carreras Leukaemia Research Institute (IJC) in Barcelona, Spain, says that it is likely that scientists will already have an idea of the shapes of some of these proteins due to their inclusion in databases of experimental structures or similarity to other proteins in such repositories.
However, such listings tend to favour human, mouse, and other mammalian proteins, according to Porta. Because it contains such a wide variety of creatures, it’s conceivable that the AlphaFold dump will provide important information. It will be an excellent resource.
Since AlphaFold’s software has been accessible for a year, scientists can already predict the structure of any protein of their choosing. The availability of forecasts in a single database, according to many, will save researchers time, money, and hassle. “You are removing another barrier to entrance,” says Porta. “I’ve used several AlphaFold models and I have never run AlphaFold myself.”
Jan Kosinski, a structural modeller at EMBL Hamburg in Germany who has managed the AlphaFold network for the last year, is eagerly anticipating the development of the database. Once, his team spent three weeks estimating the proteome — the collection of all proteins in an organism — of a virus. During the briefing, he said, “Now we can just download every model.”
Three Trillion Bytes
Including almost every known protein in the database will also enable new forms of research. Orengo and her colleagues have utilised the AlphaFold database to find new types of protein families, and they will continue to do so on a much greater scale in the future. Additionally, she and her colleagues will utilise the increased database to better comprehend the development of proteins with advantageous features — such as the capacity to devour plastic — or concerning properties, such as the potential to cause cancer. The discovery of these proteins’ distant cousins in the database may identify the origin of their features.
Martin Steinegger, a computational biologist at Seoul National University who helped design a cloud-based version of AlphaFold, is enthusiastic about the database’s growth. However, he believes that researchers will likely need to operate the network themselves. People are increasingly using AlphaFold to identify how proteins interact, despite the absence of such predictions in the database. The sequencing of genetic material from soil, ocean water, and other so-called “metagenomic” sources identifies microbial proteins are not on the database either.
Many researchers won’t be able to download the complete 23-terabyte contents of the larger AlphaFold database, which Steinegger believes may be necessary for certain advanced applications and cloud-based storage may be expensive. FoldSeek, a programme co-created by Steinegger, can rapidly identify structurally related proteins and should also be able to significantly compress AlphaFold data.
Even though the AlphaFold database contains almost every known protein, it will need to be updated when new creatures are found. The accuracy of AlphaFold’s predictions may potentially be enhanced when fresh structural data becomes available. Hassabis asserts that DeepMind has committed to maintaining the database indefinitely and that he anticipates yearly upgrades.
His expectation is that the availability of the AlphaFold database will have an enduring effect on the biological sciences. It will need a substantial shift in mindset.