Analyzing Interconnect Congestion on a Production Dragonfly System
As the HPC community continues along the road to exascale, and HPC systems grow ever larger and busier, the question of how network traffic on these systems affects application performance looms large. In order to fully address this question, the HPC community needs a broadened understanding of the behavior of traffic on production systems. This talk will present an analysis of communications traffic on the Theta cluster at Argonne Leadership Computing Facility (ALCF), with a focus on how congestion is distributed - in both space and time - across the system.
Joy Kitson is a PhD student at the University of Maryland and was a virtual intern at Argonne National Laboratory this summer. She graduated from the University of Delaware this spring with a Bachelors of Science in Computer Science and Applied Mathematics. She worked on the Caliper project at LLNL last summer, and co-presented work done by her team at LANL over summer 2018 on Effective Performance Portability during SC18.