Established in 2020 Wednesday, April 17, 2024


Speeding up drug discovery with diffusion generative models
DiffDock initiates its inference process with a set of random poses. These poses are then refined in 20 steps to be candidates for the binding pose of the ligand. The previous image shows the final steps in the process. Image courtesy of the researchers.



CAMBRIDGE, MASS.- With the release of platforms like DALL-E 2 and Midjourney, diffusion generative models have achieved mainstream popularity, owing to their ability to generate a series of absurd, breathtaking, and often meme-worthy images from text prompts like “teddy bears working on new AI research on the moon in the 1980s.” But a team of researchers at MIT's Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic) thinks there could be more to diffusion generative models than just creating surreal images — they could accelerate the development of new drugs and reduce the likelihood of adverse side effects.

A paper introducing this new molecular docking model, called DiffDock, will be presented at the 11th International Conference on Learning Representations. The model's unique approach to computational drug design is a paradigm shift from current state-of-the-art tools that most pharmaceutical companies use, presenting a major opportunity for an overhaul of the traditional drug development pipeline.

Drugs typically function by interacting with the proteins that make up our bodies, or proteins of bacteria and viruses. Molecular docking was developed to gain insight into these interactions by predicting the atomic 3D coordinates with which a ligand (i.e., drug molecule) and protein could bind together.

While molecular docking has led to the successful identification of drugs that now treat HIV and cancer, with each drug averaging a decade of development time and 90 percent of drug candidates failing costly clinical trials (most studies estimate average drug development costs to be around $1 billion to over $2 billion per drug), it’s no wonder that researchers are looking for faster, more efficient ways to sift through potential drug molecules.

Currently, most molecular docking tools used for in-silico drug design take a “sampling and scoring” approach, searching for a ligand “pose” that best fits the protein pocket. This time-consuming process evaluates a large number of different poses, then scores them based on how well the ligand binds to the protein.

In previous deep-learning solutions, molecular docking is treated as a regression problem. In other words, “it assumes that you have a single target that you’re trying to optimize for and there’s a single right answer,” says Gabriele Corso, co-author and second-year MIT PhD student in electrical engineering and computer science who is an affiliate of the MIT Computer Sciences and Artificial Intelligence Laboratory (CSAIL). “With generative modeling, you assume that there is a distribution of possible answers — this is critical in the presence of uncertainty.”

“Instead of a single prediction as previously, you now allow multiple poses to be predicted, and each one with a different probability,” adds Hannes Stärk, co-author and first-year MIT PhD student in electrical engineering and computer science who is an affiliate of the MIT Computer Sciences and Artificial Intelligence Laboratory (CSAIL). As a result, the model doesn't need to compromise in attempting to arrive at a single conclusion, which can be a recipe for failure.




To understand how diffusion generative models work, it is helpful to explain them based on image-generating diffusion models. Here, diffusion models gradually add random noise to a 2D image through a series of steps, destroying the data in the image until it becomes nothing but grainy static. A neural network is then trained to recover the original image by reversing this noising process. The model can then generate new data by starting from a random configuration and iteratively removing the noise.

In the case of DiffDock, after being trained on a variety of ligand and protein poses, the model is able to successfully identify multiple binding sites on proteins that it has never encountered before. Instead of generating new image data, it generates new 3D coordinates that help the ligand find potential angles that would allow it to fit into the protein pocket.

This “blind docking” approach creates new opportunities to take advantage of AlphaFold 2 (2020), DeepMind’s famous protein folding AI model. Since AlphaFold 1’s initial release in 2018, there has been a great deal of excitement in the research community over the potential of AlphaFold’s computationally folded protein structures to help identify new drug mechanisms of action. But state-of-the-art molecular docking tools have yet to demonstrate that their performance in binding ligands to computationally predicted structures is any better than random chance.

Not only is DiffDock significantly more accurate than previous approaches to traditional docking benchmarks, thanks to its ability to reason at a higher scale and implicitly model some of the protein flexibility, DiffDock maintains high performance, even as other docking models begin to fail. In the more realistic scenario involving the use of computationally generated unbound protein structures, DiffDock places 22 percent of its predictions within 2 angstroms (widely considered to be the threshold for an accurate pose, 1Å corresponds to one over 10 billion meters), more than double other docking models barely hovering over 10 percent for some and dropping as low as 1.7 percent.

These improvements create a new landscape of opportunities for biological research and drug discovery. For instance, many drugs are found via a process known as phenotypic screening, in which researchers observe the effects of a given drug on a disease without knowing which proteins the drug is acting upon. Discovering the mechanism of action of the drug is then critical to understanding how the drug can be improved and its potential side effects. This process, known as “reverse screening,” can be extremely challenging and costly, but a combination of protein folding techniques and DiffDock may allow performing a large part of the process in silico, allowing potential “off-target” side effects to be identified early on before clinical trials take place.

“DiffDock makes drug target identification much more possible. Before, one had to do laborious and costly experiments (months to years) with each protein to define the drug docking. But now, one can screen many proteins and do the triaging virtually in a day,” Tim Peterson, an assistant professor at the University of Washington St. Louis School of Medicine, says. Peterson used DiffDock to characterize the mechanism of action of a novel drug candidate treating aging-related diseases in a recent paper. “There is a very ‘fate loves irony’ aspect that Eroom’s law — that drug discovery takes longer and costs more money each year — is being solved by its namesake Moore’s law — that computers get faster and cheaper each year — using tools such as DiffDock.”

This work was conducted by MIT PhD students Gabriele Corso, Hannes Stärk, and Bowen Jing, and their advisors, Professor Regina Barzilay and Professor Tommi Jaakkola, and was supported by the Machine Learning for Pharmaceutical Discovery and Synthesis consortium, the Jameel Clinic, the DTRA Discovery of Medical Countermeasures Against New and Emerging Threats program, the DARPA Accelerated Molecular Discovery program, the Sanofi Computational Antibody Design grant, and a Department of Energy Computational Science Graduate Fellowship.

Reprinted with permission of MIT News







Today's News

April 2, 2023

Scientists observe flattest explosion ever seen in space

Prototype taps into the sensing capabilities of any smartphone to screen for prediabetes

Speeding up drug discovery with diffusion generative models

'Exquisite' sabretooth skull offers clues about Ice Age predator

A paper-based sensor to detect pesticides in food quickly and cheaply

Scallop eyes as inspiration for new microscope objectives

AI predicts enzyme function better than leading tools

Thermal paint: MXene spray coating can harness infrared radiation for heating or cooling

New circuit model offers insights into brain function

Pulsating blood vessels wash your brain while you sleep

Study reveals that bats experience hearing loss in old age

Pathogenic genetic variations found to boost the risk of H. pylori-related stomach cancer

Across the divide: Manufacturing better batteries

Researchers develop new, patient-friendly hydrogel platform for administering lifesaving biologics

'Taffy Galaxies' collide, leave behind bridge of star-forming material

Ultrasmall swirling magnetic vortices detected in iron-containing material

Flow cytometry tool improves methods to rapidly analyze human, plant, fungal and bacterial metabolism

A 20-year study may upend long-held theory about chromosomes and cancer



 


Editor & Publisher: Jose Villarreal
Art Director: Juan José Sepúlveda Ramírez



Tell a Friend
Dear User, please complete the form below in order to recommend the ResearchNews newsletter to someone you know.
Please complete all fields marked *.
Sending Mail
Sending Successful