As computer simulations continue to grow in size and complexity, they present a particularly challenging class of big data problems. Many application areas are moving toward exascale computing systems, systems that perform a billion billion (not a typo!) FLOPS, or Floating-point Operations Per Second. Simulations at this scale can generate output that exceeds both the storage capacity and the bandwidth available for transfer to storage, making post-processing and analysis challenging. One approach is to embed some analyses in the simulation while the simulation is running -- a strategy often called in situ analysis -- to reduce the need for transfer to storage. Another strategy is to save only a reduced set of time steps rather than the full simulation. In the latter case, the selected time steps are typically evenly spaced, where the spacing can be defined by the budget for storage and transfer. This strategy is easy to implement but fails to recognize “interesting” regions of the simulation where additional saves might provide scientific value. Our work combines both of these ideas to introduce an online in situ method for identifying a more compelling set of time steps of the simulation to save. Our approach significantly reduces the data transfer and storage requirements, and it provides improved fidelity to the simulation to facilitate post-processing and reconstruction. We illustrate the method using a computer simulation that supported NASA’s 2009 Lunar Crater Observation and Sensing Satellite mission.
This is joint work with Earl Lawrence, Mike Fugate, Claire Bowen, Larry Ticknor, Joanne Wendelberger, Jon Woodring, and Jim Ahrens.
Speaker’s Bio:
Kary Myers is a scientist and deputy group leader in the Statistical Sciences Group at Los Alamos National Laboratory (stat.lanl.gov). With support from an AT&T Labs Fellowship, she earned her PhD from Carnegie Mellon’s Statistics Department and her MS from their Machine Learning Department. At Los Alamos, she has been involved with a range of data-intensive projects, from examining electromagnetic measurements, to aiding large scale computer simulations, to developing analyses for chemical spectra from the Mars Science Laboratory Curiosity Rover. She is currently the Science Integration Lead for NA-22’s Multi-Informatics for Nuclear Operations Scenarios (MINOS) venture, and she’s Los Alamos’s practicum coordinator for the DOE’s Computational Science Graduate Fellowship. She has served as an associate editor for the Annals of Applied Statistics and the Journal of Quantitative Analysis in Sports, and she created and organizes CoDA, the Conference on Data Analysis (cnls.lanl.gov/coda2020), to showcase data-driven research from across the Department of Energy.