Vendredi 9 juillet 2021
- Imprimer
- Partager
- Partager sur Facebook
- Share on X
- Partager sur LinkedIn
Models and Resources for Attention-based Unsupervised Word Segmentation
Abstract
In this thesis we investigate the task of Unsupervised Word Segmentation (UWS) from speech. The goal of this approach is to segment utterances into smaller chunks corresponding to the words in that language, without access to any written transcription. Here we propose to ground the word segmentation process in aligned bilingual information. This is inspired by the possible availability of translations, often collected by linguists during documentation. Thus, using bilingual corpora made of speech utterances and sentence-aligned translations, we propose the use of attention-based Neural Machine Translation (NMT) models in order to align and segment. Since speech processing is known for requiring considerable amounts of data, we split this approach in two steps. We first perform Speech Discretization (SD), transforming input utterances into sequences of discrete speech units. We then train NMT models, which output soft-alignment probability matrices between units and word translations. This attention-based soft-alignment is used for segmenting the units with respect to the bilingual alignment obtained, and the final segmentation is carried to the speech signal. Throughout this work, we investigate the use of different models for these two tasks (SD and NMT). Our results suggest that, in realistic settings and across different languages, attention-based UWS is competitive against the nonparametric Bayesian model (dpseg), our baseline. Moreover, ours has the advantage of retrieving bilingual annotation for the word segments it produces.
Date et Lieu
Vendredi 9 juillet 2021 à 14h30
Auditorium du LIG dans la limite de 34 personnes
https://univ-grenoble-alpes-fr.zoom.us/j/97019916100?pwd=Rnk2ZkxrUTgxdFMxaDBBVkgrZkJ4UT09
Organisé par
Marcely ZANON BOITO
Equipe GETALP
Composition du Jury
Laurent BESACIER
Professor, UGA and Naver Labs Europe, Supervisor
Aline VILLAVICENCIO
Associate Professor, Sheffield University and UFRGS, Supervisor
François PORTET
Professor, UGA
Thierry POIBEAU
Research Director, CNRS, ENS/PSL and Université Sorbonne Nouvelle
Karen LIVESCU
Associate Professor, Toyota Technological Institute of Chicago
Claire GARDENT
Research Director, CNRS and Université de Lorraine
- Imprimer
- Partager
- Partager sur Facebook
- Share on X
- Partager sur LinkedIn