Aller au contenu principal

Vasilii Feofanov

Mercredi 29 Septembre 2021

Classification Multi-classe et Sélection de Variables avec des Données Partiellement Étiquetées

ABSTRACT

Learning with partially labeled data, known as semi-supervised learning, deals with problems where few training examples are labeled while available unlabeled data are abundant and valuable for training. In this thesis, we study this framework in the multi-class classification case with a focus on self-learning and feature selection. Self-learning is a classical approach that iteratively assigns pseudo-labels to unlabeled training examples with a confidence score above a predetermined threshold. This pseudo-labeling technique is prone to error and runs the risk of adding noisy labels into unlabeled training data. Our first contribution is to propose a theoretical framework for analyzing self-learning in the multi-class case. We derive a transductive bound over the risk of the multi-class majority vote classifier and propose to use this bound for automatically choosing the pseudo-labeling threshold. Then, we introduce a mislabeling error model to analyze the error of the majority vote classifier in the case of the pseudo-labeled data. We derive a probabilistic C-bound over the majority vote error given an imperfect label. Our second contribution is an extension of the self-learning strategy to the case where some unlabeled examples come from classes not previously seen. The new approach is applied for classification of real biological data, and it is based on assuming the existence of clusters in unlabeled data.
Finally, we propose an approach for semi-supervised feature selection that utilizes self-learning to increase the variety of training data and a new modification of the genetic algorithm to perform a feature subset search. The proposed genetic algorithm produces both a sparse and accurate solution by considering feature weights during its evolutionary process and iteratively removing irrelevant features.

Date et Lieu

Mercredi 29 Septembre 2021 à 14h00
A l'auditorium du Bâtiment IMAG
et https://univ-grenoble-alpes-fr.zoom.us/j/96730359488?pwd=aTg0ZkZGVlZUUFh6c3J1TC9aWG02dz09

Superviseurs

Massih-Reza AMINI
Professor, UGA
Emilie DEVIJVER
Research Scientist, CNRS and UGA

Membres du Jury

Anatoli IOUDITSKI
Professor, UGA, Examiner
Laurent BESACIER
Professor, NAVER LABS Europe and UGA, Examiner
Mélina GALLOPIN
Associate Professor, Université Paris-Saclay, Examiner
Pascal GERMAIN
Associate Professor, Université Laval, Reviewer
Florence d'ALCHE-BUC
Professor, Télécom Paris and Institut Polytechnique de Paris, Reviewer
Massih-Reza AMINI
Professor, UGA, Supervisor

Publié le 21 septembre 2021

Mis à jour le 21 septembre 2021