Established in 2020 Wednesday, June 12, 2024

Revealing the secrets of protein evolution using the AlphaFold database
Revealing the secrets of protein evolution using the AlphaFold database. Image courtesy: Karen Arnott/EMBL-EBI.

CAMBRIDGESHIRE.- By developing an efficient way to compare all predicted protein structures in the AlphaFold database, researchers have revealed similarities between proteins across different species. This work aids our understanding of protein evolution and has uncovered new insights into the origin of human immunity proteins.

The research was conducted by EMBL's European Bioinformatics Institute (EMBL-EBI), the Institute of Molecular Systems Biology ETH Zurich, and the School of Biological Sciences Seoul National University.

The AlphaFold database is a transformative resource in the field of protein research, serving as a comprehensive repository of AI-predicted 3D structures for all known proteins. The database fills a critical gap in understanding protein function and evolution by offering high-quality structural predictions. Although AI predictions are not a substitute for experimentally determined structures, they do provide invaluable insights for the scientific community.

For this study, published in the journal Nature, the researchers developed a new algorithm known as Foldseek Cluster that can be used to analyze large sets of protein structures all at once. Foldseek Cluster was applied to the 200 million predicted protein structures in the AlphaFold database, identifying over 2 million unique structural clusters—groups of protein structures that are similar to each other in their three-dimensional shapes. One third of these clusters lack any previous annotations, meaning they had not before been described or categorized.

Bridging the gap in protein science

Proteins are critical to processes that take place in the cell. Understanding protein structure is pivotal for studying their function and evolution. Despite significant advancements in sequence-based predictions of protein structures, computational limitations have made it difficult to study these structures at scale. Foldseek Cluster now enables structural comparisons and clustering at an unprecedented scale, reducing the time for such tasks by several orders of magnitude.

"We've entered a new era in structural biology where computational methods unlock unprecedented access to explore the protein universe," said Martin Steinegger, Assistant Professor at the School of Biological Sciences Seoul National University.

"We estimated that clustering all structures with established methods would have taken a decade when compared to the five days it took using our new method, Foldseek Cluster. Our algorithm can sift through millions of predicted protein structures in the AlphaFold database and cluster them based on their 3D shapes. This acceleration in computational power doesn't just make things faster; it makes things possible."

Protein evolution and immunity

The study also delves into the evolutionary implications of these clusters. While most clusters are ancient in origin, around 4% appear to be species-specific. This offers new insights into evolutionary phenomena such as de novo gene birth—when new genes arise from non-coding regions of the genome. The work also illustrates several examples of evolutionary relationships that could enrich our understanding of protein function across different species, including their role in human immunity.

"This work isn't just about making comparisons more efficiently, it's about gaining new insights into the evolutionary history of proteins," said Pedro Beltrao, Associate Professor at the Institute of Molecular Systems Biology, ETH Zurich.

"One of the most interesting findings from this study is our detection of structural similarities between human immune system proteins and those found in bacteria. This suggests that proteins involved in the immune system may have ancient evolutionary origins that we share with bacterial species. If true, this could reshape our understanding of immunity. Our research not only advances current knowledge but also lays out a roadmap for future investigations into the mysteries of protein function and evolution."

Improving the AlphaFold database functionality

As the AlphaFold database and other life science databases continue to grow there is a significant need to help users sift through the vast amount of data while reducing the computational costs of analyzing and managing these data. Approaches such as the Foldseek Cluster algorithm, that is scalable to billions of structures, will be invaluable in helping researchers navigate this wealth of information.

"Foldseek Cluster is more than just a technological advancement; it's an enhancement that elevates the entire AlphaFold database experience for researchers worldwide," said Sameer Velankar, Team Leader at EMBL-EBI.

"With the explosion of predicted protein structures we have in AFDB, managing and navigating these data efficiently has been a significant challenge," he continued. "Foldseek Cluster has revolutionized this process. We are working on integrating FoldSeek clusters into AFDB to streamline the analysis of large sets of protein structures and make it easier for our user community to find exactly what they're looking for."

Today's News

September 18, 2023

Scientists probe the source of key hydrocarbons on Earth-and in space

AI-driven tool makes it easy to personalize 3D-printable models

ATLAS experiment places some of the tightest limits yet on magnetic monopoles

Making AI smarter with an artificial, multisensory integrated neuron

New SARS-CoV-2 variant Eris on the rise, study shows

Revealing the secrets of protein evolution using the AlphaFold database

Corals storm back after 'sea-weeding' project

New research reveals why and when the Sahara Desert was green

Titanic galaxy cluster collision in the early universe challenges standard cosmology

Chemists use nature as inspiration for a sustainable, affordable adhesive system

Lack of maternal care found to affect development, microbiome and health of wild bees

Syphilis transmission networks and antimicrobial resistance in England uncovered using genomics

Study finds more Texas owls are testing positive for rat poisons

Scientists explain a glitch in the (extracellular) matrix


Editor & Publisher: Jose Villarreal
Art Director: Juan José Sepúlveda Ramírez

Tell a Friend
Dear User, please complete the form below in order to recommend the ResearchNews newsletter to someone you know.
Please complete all fields marked *.
Sending Mail
Sending Successful