Skip to main content

Jeongwoo KANG

Tuesday April 8, 2025

Apprentissage par transfert pour l'analyse sémantique translingue

Abstract
Abstract Meaning Representation (AMR) captures the meaning of a text and represents it in a graph format. AMR graphs benefit many NLP systems by presenting information in a structured, canonical, and less ambiguous form. Therefore, building an AMR parser to generate AMR graphs from natural language sentences automatically is an important task. However, building such models for French faces a major challenge due to a lack of data, both for evaluation and training. This research addressed these challenges through three key contributions aimed at advancing French AMR parsing. 
First, we develop multilingual AMR evaluation data. These datasets consist of two quality levels (gold and silver) depending on the degree of manual intervention. Gold data is obtained by manual data alignment. We leverage an English AMR corpus: The Little Prince, and manually align it to multilingual translations. This careful manual alignment ensures the high quality and reliability of the data. On the other hand, silver data is obtained by machine translation without manual quality control. Therefore, we evaluate this data intrinsically and extrinsically to assess its reliability. 
Secondly, we investigate zero/few-shot learning for training AMR parsers in target languages without target language training data. Specifically, we experiment with two methods: Speaking the Graph Language via Multilingual Translation (Procopio et al., 2021, SGL) and meta-learning. For SGL, we compare bilingual and multilingual configurations to identify the optimal setup for zero-shot learning. Using meta-learning, we train a model to adapt quickly to new target languages with minimal examples. Finally, we evaluate our meta-laerning approach with joint-learning to assess its effectiveness for cross-lingual AMR parsing. 
Finally, we design an alternative way to linearize AMR graphs for sequence-to-sequence model training. Recently, using a sequence-to-sequence model for AMR parsing has gained attention due to its simplicity and efficiency. A prerequisite for such methods is the linearization of AMR graphs. Penman encoding has been a common choice for AMR linearization but we hypothesize that Penman encoding has limitations in capturing deep graph structures of AMR. We suggest an alternative way to linearize them with triples and evaluate our method across various dimensions, particularly focusing on the graph depth and length.
 
Keywords: Semantic analysis, Abstract Meaning Representation (AMR), Zero-shot learning, Cross-lingual transfer learning
 

Date and place

Tuesday April 8 at 9:00
Amphithéatre, Maison Jean Kuntzman
and Zoom

Jury members

Didier Schwab
Professeur des universités, Université Grenoble Alpes, Directeur de thèse
Marie Candito
Maitresse de conférences, Université Paris Cité, Rapporteure
Patrice Bellot
Professeur des universités, Aix-Marseille Université, Rapporteur
Eric Gaussier
Professeur des universités, Université Grenoble-Alpes, Examinateur
Chloe Braud
Chargée de recherche, IRIT-CNRS, Examinatrice
Maximin Coavoux
Chargé de recherche, LIG-CNRS, Co-encadrant de thèse (Invité)
Cédric Lopez
Directeur de recherche, Emvista, Co-encadrant de thèse (Invité)

Submitted on March 28, 2025

Updated on March 28, 2025