How much can we reduce scientific data without losing science?
Jeudi 11 mai 2023
Data is the 4th pillar of science. However, for many scientific domains, the volume and velocity of data have become unbearable.

Data reduction is a necessity. It is tempting to develop specific data reduction techniques for every science domain, every experiment, every data field, and every user to keep the maximum potential for science discovery. However, this is not humanly possible because of cost and time. Lossy compression is a generic data reduction technique that works well for many consumer applications (photos, videos, music). The few research teams that embarked on designing lossy compression algorithms have made exceptional progress in the past seven years. We can reduce scientific data sets significantly, compress/decompress at x100 GB/s, and use compression for many different use cases. What’s even more striking is that progress is still happening, and in fact, it has been remarkably continuous. Stepping back, this raises a fundamental question: How much can we reduce scientific data without losing science?


Cappello received his Ph.D. from the University of Paris XI in 1994 and joined CNRS, the French National Center for Scientific Research. In 2003, he joined INRIA, where he held the position of permanent senior researcher. He initiated the Grid’5000 project in 2003 and served as its director from 2003 to 2008. In 2009, Cappello created with Marc Snir the Joint-Laboratory on Petascale Computing that has developed in 2014 as the Joint laboratory on Extreme Scale Computing (JLESC: https://​jlesc​.github​.io) gathering seven of the most prominent research and production centers in supercomputing: NCSA, Inria, ANL, BSC, JSC, Riken CCS and UTK/ICL. From 2008, as a member of the executive committee of the International Exascale Software Project, he led the roadmap and strategy efforts related to Exascale resilience. In 2016 Cappello became the lead of two Exascale Computing Project (ECP: https://​www​.exas​calepro​ject​.org/) software projects related to resilience and lossy compression of scientific data. Through his 25 years of research career, Cappello co-authored more than 250 publications and directed the development of several high-impact software tools, including XtremWeb, one of the first Desktop Grid softwares, the VeloC multilevel checkpointing environment, and the SZ lossy compressor for scientific data (https://​exas​calepro​ject​.org/​w​p​- ​c​o​n​t​e​n​t​/​u​p​l​o​a​d​s​/​2​0​1​9​/​1​1​/​V​e​l​o​C​_​S​Z.pdf). He is an IEEE Fellow, the recipient of the 2022 HPDC Achievement Award, two R&D100 awards (2019 and 2021), the 2018 IEEE TCPP Outstanding Service Award, and of the 2021 IEEE Transactions of Computer Award for Editorial Service and Excellence.

Mis à jour le 4 mai 2023