Tuesday, March 11, 2025
- Share
- Share on Facebook
- Share on X
- Share on LinkedIn
Automatic speech translation into pictograms
Abstract:
Augmentative and Alternative Communication (AAC) provides methods and tools to address impairments in speech production and comprehension. Pictograms, key elements of AAC, facilitate the communication of thoughts and emotions through simplified iconography. However, myths and economic barriers hinder its widespread adoption, highlighting the need for tailored solutions. Automatic Speech-to-Pictogram translation, a new task in Natural Language Processing (NLP), aims to generate pictogram sequences from spoken utterances. At the intersection of AAC and Speech-to-text translation (ST), this task can facilitate communication between caregivers (medical staff, family members) and individuals with language disorders. Nevertheless, it faces major challenges, including a lack of unified multimodal data, the absence of a precise evaluation framework, and the need for specialized neural models to perform pictogram translation.
In this thesis, we present three contributions to address these challenges. We introduce two methods for creating multimodal corpora aligning speech, text, and pictograms. The first method includes a grammar and a restricted vocabulary to generate a sequence of pictograms from the transcription, while the second integrates a processing pipeline to retrieve the audio from texts already translated into pictograms. Together, these methods create robust datasets for model training and evaluation.
In our second contribution, we define a specialized evaluation framework, combining both automatic and human evaluations. We adapt metrics commonly used in Automatic Speech Recognition (ASR) and Machine Translation (MT) to effectively compare models' performance. Additionally, we apply an analytical framework to interpret the quality of the translations.
Finally, in our third contribution, we investigate two approaches, cascade and end-to-end, for generating pictogram sequences from speech. We compare state-of-the-art ASR, MT, and ST models, trained or fine-tuned on the multimodal data created. Our evaluation results demonstrate the ability of cascade models to produce intelligible pictogram translations from read speech in everyday life situations. We also achieve competitive results with an end-to-end model for spontaneous speech, an ongoing challenge in NLP. The code, data, and models developed are freely available.
In this thesis, we present three contributions to address these challenges. We introduce two methods for creating multimodal corpora aligning speech, text, and pictograms. The first method includes a grammar and a restricted vocabulary to generate a sequence of pictograms from the transcription, while the second integrates a processing pipeline to retrieve the audio from texts already translated into pictograms. Together, these methods create robust datasets for model training and evaluation.
In our second contribution, we define a specialized evaluation framework, combining both automatic and human evaluations. We adapt metrics commonly used in Automatic Speech Recognition (ASR) and Machine Translation (MT) to effectively compare models' performance. Additionally, we apply an analytical framework to interpret the quality of the translations.
Finally, in our third contribution, we investigate two approaches, cascade and end-to-end, for generating pictogram sequences from speech. We compare state-of-the-art ASR, MT, and ST models, trained or fine-tuned on the multimodal data created. Our evaluation results demonstrate the ability of cascade models to produce intelligible pictogram translations from read speech in everyday life situations. We also achieve competitive results with an end-to-end model for spontaneous speech, an ongoing challenge in NLP. The code, data, and models developed are freely available.
Date et lieu
Composition du jury
Benjamin LECOUTEUX
Professeur des Universités, Université Grenoble Alpes, Directeur de thèse
Iris ESHKOL-TARAVELLA
Professeure des Universités, Université Paris 10 - Nanterre, Rapporteure
Frédéric BÉCHET
Professeur des Universités, Aix-Marseille Université, Rapporteur
Didier SCHWAB
Professeur des Universités, Université Grenoble Alpes, Co-directeur de thèse
Nathalie CAMELIN
Maîtresse de Conférences, Avignon Université, Examinateur
François PORTET
Professeur des Universités, Université Grenoble Alpes, Examinateur
- Share
- Share on Facebook
- Share on X
- Share on LinkedIn