Papers and publications

企業情報

With more than 50 years of experience in translation technologies, SYSTRAN has pioneered the greatest innovations in the field, including the first web-based translation portals and the first neural translation engines combining artificial intelligence and neural networks for businesses and public organizations.

SYSTRAN provides business users with advanced and secure automated translation solutions in various areas such as: global collaboration, multilingual content production, customer support, electronic investigation, Big Data analysis, e-commerce, etc. SYSTRAN offers a tailor-made solution with an open and scalable architecture that enables seamless integration into existing third-party applications and IT infrastructures.

Rosetta-LSF: an Aligned Corpus of French Sign Language and French for Text-to-Sign Translation

Elise Bertin-Lemée, Annelies Braffort, Camille Challant, Claire Danet, Boris Dauriac, Michael Filhol, Emmanuella Martinod, Jérémie Segouat.

13th Conference on Language Resources and Evaluation (LREC 2022), Jun 2022, Marseille, France.

Joint Generation of Captions and Subtitles with Dual Decoding

As the amount of audio-visual content increases, the need to develop automatic captioning and subtitling solutions to match the expectations of a growing international audience appears as the only viable way to boost throughput and lower the related post-production costs. Automatic captioning and subtitling often need to be tightly intertwined to achieve an appropriate level of consistency and synchronization with each other and with the video signal. In this work, we assess a dual decoding scheme to achieve a strong coupling between these two tasks and show how adequacy and consistency are increased, with virtually no additional cost in terms of model size and training complexity..

Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022), May 2022, Dublin, Ireland

SYSTRAN @ WMT 2021: Terminology Task

This paper describes SYSTRAN submissions to the WMT 2021 terminology shared task. We participate in the English-to-French translation direction with a standard Transformer neural machine translation network that we enhance with the ability to dynamically include terminology constraints, a very common industrial practice. Two state-of-the-art terminology insertion methods are evaluated based (i) on the use of placeholders complemented with morphosyntactic annotation and (ii) on the use of target constraints injected in the source stream. Results show the suitability of the presented approaches in the evaluated scenario where terminology is used in a system trained on generic data only

MinhQuang Pham, Antoine Senellart, Dan Berrebbi, Josep Maria Crego, Jean Senellart

Proceedings of the Sixth Conference on Machine Translation (WMT), Online, November 10-11, 2021

Revisiting Multi-Domain Machine Translation

When building machine translation systems, one often needs to make the best out of heterogeneous sets of parallel data in training, and to robustly handle inputs from unexpected domains in testing. This multi-domain scenario has attracted a lot of recent work that fall under the general umbrella of transfer learning. In this study, we revisit multi-domain machine translation, with the aim to formulate the motivations for developing such systems and the associated expectations with respect to performance. Our experiments with a large sample of multi-domain systems show that most of these expectations are hardly met and suggest that further work is needed to better analyze the current behaviour of multi-domain … Continued

MinhQuang Pham, Josep Maria Crego, François Yvon

Transactions of the Association for Computational Linguistics 9: 17–35, February 1th, 2021

Integrating Domain Terminology into Neural Machine Translation

This paper extends existing work on terminology integration into Neural Machine Translation, a common industrial practice to dynamically adapt translation to a specific domain. Our method, based on the use of placeholders complemented with morphosyntactic annotation, efficiently taps into the ability of the neural network to deal with symbolic knowledge to surpass the surface generalization shown by alternative techniques. We compare our approach to state-of-the-art systems and benchmark them through a well-defined evaluation framework, focusing on actual application of terminology and not just on the overall performance. Results indicate the suitability of our method in the use-case where terminology is used in a system trained on generic data only..

Elise Michon, Josep Maria Crego, Jean Senellart

Proceedings of the 28th International Conference on Computational Linguistics, December 2020

A Study of Residual Adapters for Multi-Domain Neural Machine Translation

Domain adaptation is an old and vexing problem for machine translation systems. The most common approach and successful to supervised adaptation is to fine-tune a baseline system with in-domain parallel data. Standard fine-tuning however modifies all the network parameters, which makes this approach computationally costly and prone to overfitting. A recent, lightweight approach, instead augments a baseline model with supplementary (small) adapter layers, keeping the rest of the mode unchanged. This has the additional merit to leave the baseline model intact, and adaptable to multiple domains. In this paper, we conduct a thorough analysis of the adapter model in the context of a multidomain machine translation task. We contrast multiple … Continued

MinhQuang Pham, Josep Maria Crego, François Yvon, Jean Senellart

Proceedings of the Fifth Conference on Machine Translation, November 2020

Priming Neural Machine Translation

Priming is a well known and studied psychology phenomenon based on the prior presentation of one stimulus (cue) to influence the processing of a response. In this paper, we propose a framework to mimic the process of priming in the context of neural machine translation (NMT). We evaluate the effect of using similar translations as priming cues on the NMT network. We propose a method to inject priming cues into the NMT network and compare our framework to other mechanisms that perform micro-adaptation during inference. Overall, experiments conducted in a multi-domain setting confirm that adding priming cues in the NMT decoder can go a long way towards improving the translation … Continued

MinhQuang Pham, Jitao Xu, Josep Maria Crego, François Yvon, Jean Senellart

Proceedings of the Fifth Conference on Machine Translation,November 2020

Efficient and High-Quality Neural Machine Translation with OpenNMT

This paper describes the OpenNMT submissions to the WNGT 2020 efficiency shared task. We explore training and acceleration of Transformer models with various sizes that are trained in a teacher-student setup. We also present a custom and optimized C++ inference engine that enables fast CPU and GPU decoding with few dependencies. By combining additional optimizations and parallelization techniques, we create small, efficient, and high-quality neural machine translation models.

Guillaume Klein, Dakun Zhang, Clément Chouteau, Josep Crego, Jean Senellart

Proceedings of the Fourth Workshop on Neural Generation and Translation, pages 211--217, Association for Computational Linguistics, July 2020

Boosting Neural Machine Translation with Similar Translations

This presentation demonstrates data augmentation methods for Neural Machine Translation to make use of similar translations, in a comparable way a human translator employs fuzzy matches. We show how we simply feed the neural model with information on both source and target sides of the fuzzy matches, and we also extend the similarity to include semantically related translations retrieved using distributed sentence representations. We show that translations based on fuzzy matching provide the model with « copy » information while translations based on embedding similarities tend to extend the translation « context ». Results indicate that the effect from both similar sentences are adding up to further boost accuracy, are … Continued

Jitao Xu, Josep Crego, Jean Senellart

Proceedings of the Sixth Conference on Machine Translation (WMT), Online, November 10-11, 2021

Generic and Specialized Word Embeddings for Multi-Domain Machine Translation

Supervised machine translation works well when the train and test data are sampled from the same distribution. When this is not the case, adaptation techniques help ensure that the knowledge learned from out-of-domain texts generalises to in-domain sentences. We study here a related setting, multi-domain adaptation, where the number of domains is potentially large and adapting separately to each domain would waste training resources. Our proposal transposes to neural machine translation the feature expansion technique of (Daum’{e} III, 2007): it isolates domain-agnostic from domain-specific lexical representations, while sharing the most of the network across domains. Our experiments use two architectures and two language pairs: they show that our approach, while … Continued

Minh Quang Pham, Josep Crego, François Yvon, Jean Senellart

Book: "International Workshop on Spoken Language Translation", "Proceedings of the 16th International Workshop on Spoken Language Translation (IWSLT)", November 2019, Hong-Kong, China