Lundi 22 Juin 2020
Rethinking the Design of Sequence-to-Sequence Models for Efficient Machine Translation

In recent years, deep learning has enabled impressive achievements in Machine Translation. Neural Machine Translation (NMT) relies on training deep neural networks with large number of parameters on vast amounts of parallel data to learn how to translate from one language to another. One crucial factor to the success of NMT is the design of new powerful and efficient architectures. State-of-the-art systems are encoder-decoder models that first encode a source sequence into a set of feature vectors and then decode the target sequence conditioning on the source features. In this thesis we question the encoder-decoder paradigm and advocate for an intertwined encoding of the source and target so that the two sequences interact at increasing levels of abstraction. For this purpose, we introduce Pervasive Attention, a model based on two-dimensional convolutions that jointly encode the source and target sequences with interactions that are pervasive throughout the network. To improve the efficiency of NMT systems, we explore online machine translation where the source is read incrementally and the decoder is fed partial contexts so that the model can alternate between reading and writing. We investigate deterministic agents that guide the read/write alternation through a rigid decoding path, and introduce new dynamic agents to estimate a decoding path for each sample. We also address the resource-efficiency of encoder-decoder models and posit that going deeper in a neural network is not required for all instances. We design depth-adaptive Transformer decoders that allow for anytime prediction and sample-adaptive halting mechanisms to favor low cost predictions for low complexity instances and save deeper predictions for complex scenarios.

Mis à jour le 28 December 2020