Proteins aren’t static, they have a wide range of motions that span multiple length- and timescales and it’s not always understood which motions are important, notes Ramanathan. To understand and simulate those actions requires a huge amount of data and computing resources.
Developing a reasonable simulation of the spike protein alone can create a huge system consisting of approximately 1.8 million atoms and simulations can consist of enormous datasets that tax the resources of even the largest supercomputers. In order to make that data more accessible for interpretation, the team developed a machine learning method that can summarize large volumes of data.
“One of the key things that this method allowed us to do was to determine what was interesting, what was important, even those things that were not obvious to the human eye,” said Ramanathan. “So, when you look deeper using the simulations, you start seeing significant changes in the protein structure, which told us something about how the spike protein opens up such that it can interact with the ACE2 receptor.”
As the size of the systems they were working on grew, the team faced challenges of scaling all of the data to run fluidly on today’s biggest and best supercomputing systems, as well as their key components.
Because many of the machine learning models they were training on these large simulations needed to be efficiently scaled for use on supercomputers, they partnered with NVIDIA, a leader in GPU and artificial intelligence design, to effectively run the models on Summit, at the DOE’s Oak Ridge National Laboratory. The team also utilized many of the top U.S. supercomputers, including Theta at Argonne; Frontera/Longhorn at Texas Advanced Computing Center; Comet at San Diego Supercomputing Center; and Lassen at DOE’s Lawrence Livermore National Laboratory, to uncover alternate ways to handle the deluge of data.
“Given the complexity of the data, trying to understand the ACE2 receptor-spike interaction seemed almost impossible at this scale,” Ramanathan confided. “One of the things that we clearly showed was that we could actuate a sampling of these dynamical configurations, pushing the idea that we could use AI to bridge these different scales.”
The data generated, so far, is providing new insights into how the stalk region of the spike protein changes its overall motions when it interacts with the ACE2 receptor, he said. Eventually, these kinds of insights derived from the highly conjoined combination of machine learning and simulation will help facilitate antibody or vaccine discoveries.
The team’s article, “AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics,” will appear in the International Journal of High Performance Computing Applications, 2020.
This research was supported by the Exascale Computing Project, a collaborative effort of the U.S. DOE Office of Science and the National Nuclear Security Administration, and the DOE’s National Virtual Biotechnology Laboratory with funding from the Coronavirus CARES Act. This work used resources, services, and support from the COVID-19 HPC Consortium.
==========
Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.
The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.