Aller au contenu principal

Lucas Leandro Nesi

Strategies for Distributing Task-Based Applications on Heterogeneous Platforms

Jeudi 14 Septembre 2023

Abstract

HPC platforms are vastly heterogeneous because of intra-node resources like accelerators and inter-node heterogeneity when there are different machines. The applications that use these resources are already very complex, with many distinct operations and phases, and developers must consider all sets of diverse computational resources. The task-based programming paradigm is a modern alternative to increase the computational efficiency of intra-node heterogeneous resources while maintaining relative development simplicity. The application defines a Direct Acyclic Graph of tasks and a dynamic runtime asynchronously schedules them to the resources respecting task dependencies. However, handling different types of nodes requires new specific strategies to distribute an application in this asynchronous and heterogeneous environment. This thesis studies the problem of distributing this type of complex task-based applications over those diverse system-level resources, proposing strategies to divide their load correctly, considering computational heterogeneity, multiple-phase asynchronism, and adaptability. This work uses real applications to validate its results with experiments conducted in large testbeds and a supercomputer. The thesis' main contributions are the following. (i) Strategies for distributing a single application operation considering the trade-off of communication, critical path, and heterogeneous load balancing. (ii) A set of optimizations for improving asynchronous phase overlap in applications. (iii) A methodology for computing the relative power of each phase on each heterogeneous group of nodes considering the phase overlap. (iv) A distribution strategy for an antecedent phase reducing communication redistribution. (v) A strategy for the application dynamically adapts during execution to decide the best subset of nodes for each phase. (vi) An extended comprehensive analysis of the experiments that include a methodology to analyze the application progress per node resilient to heterogeneity and that can cluster nodes with similar behavior. Ultimately, this thesis is a step toward efficiently exploiting and combining any of these diverse resources, using them to handle applications' distinct necessities better, and improving their overall performance.

 

Date et Lieu

Jeudi 14 Septembre à 14h
UFRGS Informatics Institute building 43412
And https://www.youtube.com/live/wlNHf3EJ8cg.

 

Composition du Jury

Alba Cristina MAGALHAES ALVES de MELO
University of Brasília
Katherine E. ISAACS
The University of Utah
François TRAHAY
Télécom SudParis
Hatem LTAIEF
King Abdullah University of Science and Technology
Sascha HUNOLD
Technische Universität Wien
Yves DENNEULIN
Université Grenoble Alpes
Marcus ROLF PETER RITT
UFRGS

Publié le 14 septembre 2023

Mis à jour le 14 septembre 2023