Challenges and Remedies for Context-Aware Neural Machine Translation"
Mardi 28 Mars 2023


Current neural machine translation systems have reached close-to-human quality in translating stand-alone sentences.
When it comes to translating documents, instead, machine translation has a significant margin of improvement ahead.
In fact, some ambiguous elements of the discourse have multiple valid translations at the sentence level but only one at the document level, where they lose their ambiguity in the presence of extra-sentential context.
Retrieving and exploiting such context to produce consistent document-level translations represents a challenging task.
Many researchers have taken up this challenge in recent years and proposed approaches to context-aware neural machine translation.
A common taxonomy divides them into two families: multi-encoding and single-encoding approaches, also known as concatenation approaches.
The former family includes all the approaches that employ the standard encoder-decoder architecture to produce latent representations of the current sentence and that introduce additional learnable modules to encode and integrate its context, i.e., the previous or following sentences.
Concatenation approaches, instead, rely entirely on the encoder-decoder architecture, but they concatenate the context to the current sentence before feeding it into the system.
In this work, we analyze both families of approaches to context-aware neural machine translation, identify some of their weaknesses, and address them with novel solutions.
For multi-encoding systems, we identify two learning challenges faced by the modules that handle context: the sparsity of the training signal and the sparsity of disambiguating contextual elements.
We introduce a novel pre-training setting in which sparsity is alleviated and demonstrate its effectiveness in fostering the learning process.
For concatenation approaches, we address the challenge of dealing with long sequences by proposing a training objective that encourages the model to focus on the most relevant parts of each sequence.
We couple this training objective with a novel technique to strengthen sentence boundaries and analyze their impact on the learned attention mechanism.
Finally, we present a comparative study of various methods for discerning segments in the concatenation sequence, including novel variants of segment embeddings.

Mis à jour le 28 mars 2023