Martin Schulz - Adaptive Resource Management for Next Generation Systems

14:00
Jeudi
11
Jan
2018
Organisé par : 
L'équipe des Keynote Speeches : Sihem Amer-Yahia, Jérôme David, Renaud Lachaize
Intervenant : 
Martin Schulz, Technische Universität München (TUM)
Martin Schulz

Information détaillée : 

 

Martin Schulz is a Full Professor and Chair for Computer Architecture and Computer Organization at the Technische Universität München (TUM), which he joined in 2017. Prior to that, he held positions at the Center for Applied Scientific Computing (CASC) at Lawrence Livermore National Laboratory (LLNL) and Cornell University. He earned his Doctorate in Computer Science in 2001 from TUM and a Master of Science in Computer Science from UIUC. Martin has published over 200 peer-reviewed papers and currently serves as the chair of the MPI Forum, the standardization body for the Message Passing Interface. His research interests include parallel and distributed architectures and applications; performance monitoring, modeling and analysis; memory system optimization; parallel programming paradigms; tool support for parallel programming; power-aware parallel computing; and fault tolerance at the application and system level. Martin was a recipient of the IEEE/ACM Gordon Bell Award in 2006 and an R&D 100 award in 2011.

 

Résumé : 

As we move into the exascale era and beyond, high performance computing systems will become more and more resource constrained, and they will face this problem with a growing number of different resources. To solve this problem we need new and more adaptive resource management approaches that can deal with multi-constraint scenarios and that can adjust themselves to changing conditions in the system. In the first part of the talk, I will discuss these challenges using constraints on power and energy as an example and will show how this, in some cases, can have unexpected consequences on application performance.

To solve these challenges, however, we first need to better understand the exact behavior of our systems, their bottlenecks and the impact our workloads have on them. This requires a system wide monitoring and performance data management - from system level measurements to application feedback - combined with the matching analytics capabilities. In the second part of the talk I will discuss concepts to enable such monitoring, how they can be used to feed user facing tools, as well as can be used for new resource management schemes. This is part of a first step towards a more efficient utilization of the scarce resources and can ultimately lead to new design tradeoffs for future systems.