Yannick Malot | LIG - Université Grenoble Alpes

Thursday, January 15th, 2026

Quantized DNN learning algorithms with limited hardware overhead for Edge implementation

Abstract

Artificial intelligence (AI) models using deep neural networks (DNNs) have strongly grown in complexity in the last few years, and the scope of their applications has only been growing. However, these models require a phase of training which requires large resources such as computer memory and training data.

Simultaneously, the development of the Internet of Things (IoT) poses the challenge of deploying AI solutions closer to sensors, directly on tiny devices. This concept, called edge AI, targets applications in various fields such as the industry or the medical domain. The devices concerned range from Micro Controller Units (MCUs) to smartphones : they have little power and pose strong constraints in terms of energy, memory and latency. Yet, the resources required by models using DNNs often exceed these constraints. Edge AI also raises privacy concerns, since deployed devices are surrounded by possibly sensitive data.
This thesis focuses on edge AI and how models can be compressed, studying means to deploy and train them directly on tiny devices. This work covers methods of parameter quantization and learning algorithms suited to the resource constraints inherent to the edge context. The presented contributions are:
• A benchmark of model compression methods using Quantization Aware Training (QAT), which aims at training models doing computations in low numerical precision. This analysis suggests that existing, simple methods deliver good results and are easy to use, whereas more sophisticated methods may not beat them;
• A study of the trade-off between model complexity and its numerical precision, at constant memory footprint, advocating in favor of large, heavily quantized models;
• Methods for on-device training of DNNs, either by leveraging the unique properties of specific analog hardware or by using dedicated learning algorithms, enabling to divide the memory required to train a model by at least ten.

Date et lieu

Thursday, January 15th at 14:00
Bâtiment GreEn-ER, sur la presqu'île
And visio

Composition du Jury

Jury

Olivier Sentieys
INRIA Rennes (Rapporteur)
Florent de Dinechin
INSA Lyon (Rapporteur)
Stefan Duffner
INSA Lyon (Examinateur)
Julia Gusak
INRIA Bordeaux (Examinateur)
Frédéric Pétrot
ENSIMAG (Examinateur)
Kim Thang Nguyen
ENSIMAG (Directeur de thèse)

Invités
Thomas Mesquida
CEA LIST (Co-encadrant)
Sylvain Bouveret
ENSIMAG (Co-encadrant)