Abstract:
The growing demand for efficient data analysis and visualization of modern HPC applications has increased efforts to leverage data-centric frameworks from the Big Data ecosystem. Nevertheless, we need to adapt these platforms to get the full benefits of HPC infrastructures. This seminar explores the possibility to layer the popular Spark application model and its higher level tools (e.g. GraphX, Streaming, Mllib) on top of a highly scalable MPI-based library (DIY). We will present an architecture that maps the RDD data abstraction of Spark and its task-oriented execution model, with the block-based nature of DIY and the underlying fabric of MPI processes. As a result, the data-intensive communication patterns implemented in DIY are transparently supported with minor additions to the Spark programming model.
Bio:
Silvina Caíno-Lores is a PhD candidate at Carlos III University of Madrid under the supervision of Prof. Jesús Carretero and Prof. Florin Isaila.