In situ models represent a relevant alternative to classical post hoc workflows as they allow bypassing disk accesses, thus reducing the IO bottleneck. However, as most in situ data analytics tools are built on MPI, they are complicated to use, especially to parallelize irregular algorithms. In this seminar, we will talk about the motivations for introducing a bridging model between bulk-synchronous parallel and distributed task-based models, we will show our proof of concept, Deisa[1], that couples MPI with Dask, providing a higher level and easier way to write in situ analytics, its architecture and the integration with Dask distributed, and finally, present our results compared to post hoc analytics.
Deisa[1], an in situ analytics tool, [1] A. Gueroudji, J. Bigot and B. Raffin, "DEISA: Dask-Enabled In Situ Analytics," 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC), 2021, pp. 11-20, doi: 10.1109/HiPC53243.2021.00015.