Fixing Translation Divergences in Parallel Corpora for Neural MT - Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Fixing Translation Divergences in Parallel Corpora for Neural MT

Minh Quang Pham
Josep-Maria Crego
  • Fonction : Auteur
  • PersonId : 1038144
Jean Senellart
  • Fonction : Auteur
  • PersonId : 1038145
François Yvon

Résumé

Corpus-based approaches to machine translation rely on the availability of clean parallel corpora.Such resources are scarce, and because of the automatic processes involved in their preparation, they are often noisy. This paper describes an unsupervised method for detecting translation divergences in parallel sentences. We rely on a neural network that computes cross-lingual sentence similarity scores, which are then used to effectively filter out divergent translations. Furthermore, similarity scores predicted by the network are used to identify and fix some partial divergences, yielding additional parallel segments. We evaluate these methods for English-French and English-German machine translation tasks, and show that using filtered/corrected corpora actually improves MT performance.
Fichier principal
Vignette du fichier
D18-1328.pdf (440.69 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-01908309 , version 1 (30-10-2018)

Identifiants

  • HAL Id : hal-01908309 , version 1

Citer

Minh Quang Pham, Josep-Maria Crego, Jean Senellart, François Yvon. Fixing Translation Divergences in Parallel Corpora for Neural MT. Conference on Empirical Methods in Natural Language Processing, Nov 2018, Bruxelles, Belgium. pp.2967 - 2973. ⟨hal-01908309⟩
189 Consultations
123 Téléchargements

Partager

Gmail Facebook X LinkedIn More