Modern day hardware platforms are parallel and diverse, ranging from mobiles to data centers and co-location of mainstream parallel applications is increasingly becoming common. The resulting resource contention may lead to drastic degradation in a program’s performance. In addition, the execution environment composed of workloads and hardware resources, is dynamic and unpredictable. Efficient matching of program parallelism to machine parallelism under uncertainty is hard. The mapping policies should anticipate these variations and enable effective resiliency to the a pplications. This talk proposes solutions to the mapping of parallel programs in dynamic environments. It employs predictive modelling techniques to adaptively map programs by determining the best degree of parallelism. When evaluated on highly dynamic executions, these solutions are proven to surpass default, state-of-art adaptive and analytic approaches.
Next, I will introduce an approach for a transparent fault-tolerance approach for MPI that leverages the application checkpoint/restart mechanism used in scientific applications. I will then present a novel approach to optimize applications running on heterogeneous systems. This work analyzes parallel codes and uses machine learning model to decide the best data placement in multi-level memory hierarchy in GPUs.