Vendredi 9 juillet 2021
Models and Resources for Attention-based Unsupervised Word Segmentation

In this thesis we investigate the task of Unsupervised Word Segmentation (UWS) from speech. The goal of this approach is to segment utterances into smaller chunks corresponding to the words in that language, without access to any written transcription. Here we propose to ground the word segmentation process in aligned bilingual information. This is inspired by the possible availability of translations, often collected by linguists during documentation. Thus, using bilingual corpora made of speech utterances and sentence-aligned translations, we propose the use of attention-based Neural Machine Translation (NMT) models in order to align and segment. Since speech processing is known for requiring considerable amounts of data, we split this approach in two steps. We first perform Speech Discretization (SD), transforming input utterances into sequences of discrete speech units. We then train NMT models, which output soft-alignment probability matrices between units and word translations.  This attention-based soft-alignment is used for segmenting the units with respect to the bilingual alignment obtained, and the final segmentation is carried to the speech signal. Throughout this work, we investigate the use of different models for these two tasks (SD and NMT). Our results suggest that, in realistic settings and across different languages, attention-based UWS is competitive against the nonparametric Bayesian model (dpseg), our baseline. Moreover, ours has the advantage of retrieving bilingual annotation for the word segments it produces.

Mis à jour le 5 July 2021