
Join us on April 30, 2025, for a webinar on messaging software implementation on Aurora and how to choose the right programming environment. ALCF’s Vitali Morozov will discuss affinity and process placements to ensure that CPU cores, GPUs, NICs, and memory domains interact over the shortest path with maximum efficiency.
This webinar is intended for users planning to use a large fraction of Aurora for production computations and will focus on considerations for a large-scale, multi-node environment.
Aurora is an Exascale supercomputer located at Argonne Leadership Computing Facility. The system has 10624 compute nodes with each node having two 52-core Intel Xeon CPUs, six discrete Intel Xeon MAX GPUs, two DDR5 memory domains, two high-bandwidth memory domains, and eight Slingshot network cards. We use message passing interface (MPI) to program this machine; however, the complexity and the scale of the system require special considerations to achieve expected stability and performance. Intel has made significant contribution to improved interaction of various system components within an MPI context, and the webinar will present the results of some of those contributions for users to use. Some time will be spent on discussing the problems of choosing the number of processes on a node, the distribution of processes on a node, compact or distributed location of nodes in a system, collective operations, and recommended environment variables.
Vitali Morozov is a Senior Software Engineering at the Argonne Leadership Computing Facility. He received his M.S. in Mathematics and M.S. in Computer Science from Novosibirsk State University, a Ph.D. in Engineering from Ershov’s Institute for Informatics Systems, Novosibirsk, Russia. At Argonne since 2001, he has been working on computer simulation of plasma generation, plasma material interactions, plasma thermal and optical properties, and applications to laser and discharge-produced plasmas. At the ALCF, he has been working on performance projections, performance analysis and simulation, studying the hardware trends and evaluates experimental and non-conventional hardware. He is also porting and tuning applications to large-scale supercomputers.