Skip to main content

Victor Boone

Monday, November 25th, 2024

Optimal regrets in Markov decision processes

Abstract:
In this manuscript, we investigate the problem of regret minimization in Markov decision processes under the average gain criterion. In both the model independent (aka minimax) and model dependent settings, we provide new lower bounds on the expected regret as well as algorithmic methods achieving them — hence being optimal and solving (at first order) the long standing problem of regret minimization in (communicating) Markov decision processes. Beyond regret minimization, we further study the trajectorial behavior of classical algorithms from a novel local viewpoint, through the lens of new learning metrics that quantify how algorithms choose actions locally rather than globally. Further techniques are provided to correct the behavior of existing algorithms regarding these metrics. 

Date and place

Monday, November 25th at 2pm at Bâtiment IMAG, Salle 2 (RDC)
and Zoom

Jury members

Aurélien Garivier
Professeur, ENS de Lyon (Rapporteur)
Ronald Ortner
Professeur associé, Montanuniversität Leoben (Rapporteur)
Eric Gaussier
Professeur des universités, Université Grenoble Alpes (Examinateur)
Pierre Gaillard
Chargé de recherche, Centre INRIA de l'Université Grenoble Alpes (Examinateur)
Tor Lattimore
Chargé de recherche, DeepMind London (Examinateur)
Bruno Gaujal
Directeur de recherche, Centre INRIA de l'Université Grenoble Alpes (Directeur de thèse)

Submitted on December 7, 2024

Updated on December 7, 2024