Skip to main content

Gabriel Job Antunes GRABHER

Monday, May 4th, 2026

Towards practical performance anomaly detection in microservice applications

Abstract: 
Microservices are a popular architecture for building scalable and maintainable distributed applications. However, performance anomaly detection in such systems is challenging due to the diverse resource usage patterns exhibited by services, frequent system changes, and the large volume of monitoring data. These factors make it difficult to accurately distinguish normal from anomalous behavior while maintaining a practical and efficient detection solution. 
In the first part of this thesis, we explore a practical approach for detecting performance anomalies in microservice systems by proposing a new machine learning model, called ctl-SRNN. The approach consists of implementing local models, where a ctl-SRNN model is built to monitor each microservice using unlabeled, service-specific resource usage time series data. This design enables independent modification of each model in response to service changes while reducing training and configuration time. The ctl-SRNN model is conceived for efficient training and automatic threshold generation using a Dynamic Variational Autoencoder that captures temporal dependencies in data, along with a control variable that accounts for workload variations in the service. Experimental results show that ctl-SRNN outperforms existing solutions by 91% in detection accuracy while requiring only a small amount of training data with minimal configuration.
Ultimately, the output of an anomaly detector is used to guide actions that address the detected anomalies. However, detectors can make mistakes, and it is difficult to determine how they can impact service performance and resource usage cost. The second part of this thesis analyzes how detector characteristics (i.e. precision, recall, and inspection frequency) affect the performance-to-cost trade-off. Using Stochastic Reward Nets, we create a statistical model of a service monitored by a performance anomaly detector. With this model, we perform numerical analyses to study the impact of detector characteristics on the performance and resource usage of the monitored service. Our results show that achieving a high precision and a high recall is not always necessary. If detection can be run frequently, a high precision is enough to obtain a good performance-to-cost trade-off, but if the detector is run infrequently, recall becomes the most important.
 
Keywords: Microservices, Cloud, Anomaly Detection, Dynamical Variational Autoencoders, Uncertainty Quantification, Stochastic Reward Nets, Stochastic Models.
 

Date and place

Monday, May 4th at 13:30 
Maison du Doctorat Jean Kuntzmann, amphithéâtre

Jury members

Thesis supervision
Noel DE PALMA
Directeur de thèse, Professeur des Universités, Université Grenoble Alpes
Thomas ROPARS
Co-encadrant de thèse, Maître de Conférences, Université Grenoble Alpes
 
Thesis committee
Noel DE PALMA
Directeur de thèse, Professeur des Universités, Université Grenoble-Alpes
Etienne RIVIÈRE
Rapporteur, Professeur, Université Catholique Louvain
Romain ROUVOY
Rapporteur, Professeurs des Universités, Université de Lille
Sara BOUCHENAK
Examinatrice, Professeur des Universités, Institut National des Sciences Appliquées Lyon
Didier DONSEZ
Examinateur, Professeur des Universités, Université Grenoble-Alpes
Lucas MELLO SCHNORR
Examinateur, Associate Professor, Federal University of Rio Grande do Sul (UFRGS)

Submitted on April 21, 2026

Updated on April 21, 2026