Ngoc Tien Le - Advanced Quality Measures For Speech Translation

Organisé par : 
Ngoc Tien Le
Intervenant : 
Ngoc Tien Le
Équipes : 
Mots clés : 
Information détaillée : 


Jury composition:

  • M. Yannick Estève
    Professeur, Laboratoire d’Informatique de l’Université du Maine (LIUM), Le Mans Université, Reviewer
  • M. Georges Linarès
    Professeur, Laboratoire Informatique d’Avignon (LIA), Université d’Avignon, Reviewer
  • M. Frédéric Béchet
    Professeur, Laboratoire d’Informatique Fondamentale de Marseille (LIF), Aix Mar- seille Université, Examiner
  • M. Laurent Besacier
    Professeur, Laboratoire d’Informatique de Grenoble (LIG), Université Grenoble Alpes, Supervisor
  • M. Benjamin Lecouteux
    Maître de conférences, Laboratoire d’Informatique de Grenoble (LIG), Université Grenoble Alpes, Co-supervisor
Résumé : 

The main aim of this thesis is to investigate the automatic quality assessment of spoken language translation (SLT), called Confidence Estimation (CE) for SLT. Due to several factors, SLT output having unsatisfactory quality might cause various issues for the target users. Therefore, it is useful to know how we are confident in the tokens of the hypothesis. Our first contribution of this thesis is a toolkit LIG-WCE which is a customizable, flexible framework and portable platform for Word-level Confidence Estimation (WCE) of SLT. WCE for SLT is a relatively new task defined and formalized as a sequence tagging problem in which each word of SLT output is marked as one of binary labels (good or bad) in agreement with a large feature set. We propose several word confidence estimators (WCE) based on our automatic evaluation of transcription (ASR) quality, translation (MT) quality, or both (combined/joint ASR+MT). We built a corpus that contains 6.7k utterances in which each quintuplet consists of ASR hypothesis, verbatim transcript, text translation, speech translation and post-edition of translation. We performed several experiments for WCE using joint ASR and MT features to show that MT features remain the most influent while ASR features can bring interesting complementary information.

As another contribution, we propose two methods to disentangle ASR errors and MT errors, where each word in the SLT hypothesis is tagged as good, asr_error or mt_error. We thus explore the contributions of WCE for SLT in finding out the source of SLT errors.

Furthermore, we propose a simple extension of WER metric in order to penalize differently substitution errors according to their context using word embeddings. For instance, the proposed metric should catch near matches (mainly morphological variants) and penalize less this kind of error which has a more limited impact on translation performance. Our experiments show that the correlation of the new proposed metric with SLT performance is better than the one of WER. Oracle experiments are also conducted and show the ability of our metric to find better hypotheses (to be translated) in the ASR N-best. Finally, we present and analyze a preliminary experiment in which ASR tuning is applied by our new metric.

To conclude, we have proposed several prominent strategies for CE of SLT that could have a positive impact on several applications for SLT. Robust quality estimators for SLT output can be applied to provide feedback to the user in computer-assisted speech-to-text scenarios or to re-score ST graphs.