With an ALCF Data Science Program award, researchers from NCSA and ALCF are using deep learning and high-performance computing to develop a highly accurate method for classifying hundreds of millions of unlabeled galaxies.
In 2007, the Sloan Digital Sky Survey (SDSS) launched a citizen science campaign called Galaxy Zoo to enlist the public’s help in classifying the hundreds of thousands of galaxy images captured by an optical telescope. Through this highly successful crowdsourcing effort, volunteers reviewed the images online to help determine whether each galaxy had a spiral or elliptical structure.
Leveraging data generated by the Galaxy Zoo project, a team of scientists is now applying the power of artificial intelligence (AI) and high-performance supercomputers to accelerate efforts to analyze the increasingly massive datasets produced by ongoing and future cosmological surveys.
In a new study, researchers from the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign and the Argonne Leadership Computing Facility (ALCF) at the U.S. Department of Energy’s (DOE) Argonne National Laboratory have developed a novel combination of deep learning methods to provide a highly accurate approach to classifying hundreds of millions of unlabeled galaxies. The team’s findings were published in Physics Letters B.
“The NCSA Gravity Group initiated, and continues to spearhead, the use of deep learning at scale for gravitational wave astrophysics. We have expanded our research portfolio to address a computational grand challenge in cosmology, innovating the use of several deep learning methods in combination with high-performance computing (HPC),” said Eliu Huerta, NCSA Gravity Group Lead. “Our work also showcases how the interoperability of NSF and DOE supercomputing resources can be used to accelerate science.”
“Deep learning research has rapidly become a booming enterprise across multiple disciplines. Our findings show that the convergence of deep learning and HPC can address big-data challenges of large-scale electromagnetic surveys. This research is part of a multidisciplinary program at NCSA to push the boundaries of AI and HPC in scientific research,” added Asad Khan, a graduate student at the NCSA Gravity Group and lead author of this study.
Supported by an ALCF Data Science Program award, the team used the SDSS datasets produced by the Galaxy Zoo campaign to train neural network models to classify galaxies in the Dark Energy Survey (DES) that overlap the footprint of both surveys. The method’s ability to identify spiral and elliptical galaxies was found to be 99.6 percent accurate.
“Using the millions of classifications carried out by the public in the Galaxy Zoo project to train a neural network is an inspiring use of the citizens science program,” said Elise Jennings, ALCF computer scientist. “This exciting research also sheds light on the inner workings of the neural network, which clearly learns two distinct feature clusters to identify spiral and elliptical galaxies.”
The team’s innovative framework lays the foundations to exploit deep transfer learning at scale, data clustering and recursive training to produce large-scale galaxy catalogs in the Large Synoptic Survey Telescope (LSST) era.
“We’re excited to work with the team at NCSA and Argonne as well as the researchers who drove the original Galaxy Zoo effort to pursue this important area of scientific discovery,” said Tom Gibbs, manager of developer relations at NVIDIA. “Using these new methods, we’re taking an important step to understanding the mystery of dark energy.”
Highlights of the study include:
Acknowledgments:
The ALCF is a DOE Office of Science User Facility. This research was carried out as part of the ALCF Data Science program (ADSP). The goal of ADSP is to accelerate discoveries across a broad range of scientific domains by supporting projects that require data-intensive and machine learning algorithms to address challenging research problems at scale.
This research is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993) and the State of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.
This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562. Specifically, it used the Bridges system, which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC). We gratefully acknowledge grant TG-PHY160053.
Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation's first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America's scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy's Office of Science.
The U.S. Department of Energy's Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science