Papers and publications

icon-search

企業情報

With more than 50 years of experience in translation technologies, SYSTRAN has pioneered the greatest innovations in the field, including the first web-based translation portals and the first neural translation engines combining artificial intelligence and neural networks for businesses and public organizations.

SYSTRAN provides business users with advanced and secure automated translation solutions in various areas such as: global collaboration, multilingual content production, customer support, electronic investigation, Big Data analysis, e-commerce, etc. SYSTRAN offers a tailor-made solution with an open and scalable architecture that enables seamless integration into existing third-party applications and IT infrastructures.

Towards Example-Based NMT with Multi-Levenshtein Transformers

Towards Example-Based NMT with Multi-Levenshtein Transformers

Maxime Bouthors, Josep Crego, François Yvon.

2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), Dec 2023, Singapore.

BiSync: A Bilingual Editor for Synchronized Monolingual Texts

BiSync: A Bilingual Editor for Synchronized Monolingual Texts

In our globalized world, a growing number of situations arise where people are required to communicate in one or several foreign languages. In the case of written communication, users with a good command of a foreign language may find assistance from computeraided translation (CAT) technologies. These technologies often allow users to access external resources, such as dictionaries, terminologies or bilingual concordancers, thereby interrupting and considerably hindering the writing process. In addition, CAT systems assume that the source sentence is fixed and also restrict the possible changes on the target side.In order to make the writing process smoother, we present BiSync, a bilingual writing assistant that allows users to freely compose … Continued

Josep Crego, Jitao Xu, François Yvon.

61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), Jul 2023, Toronto, Canada.

Example-Based Machine Translation from Text to a Hierarchical Representation of SignLanguage

Example-Based Machine Translation from Text to a Hierarchical Representation of SignLanguage

This paper presents an experiment in automatic translation from text to sign language (SL). As we do not have a large aligned corpus, we have explored an example-based approach, using AZee, an intermediate representation of the discourse in SL in the form of hierarchical expressions.

Élise Bertin-Lemée, Annelies Braffort, Camille Challant, Claire Danet, Michael Filhol

18e Conférence en Recherche d'Information et Applications -- 16e Rencontres Jeunes Chercheurs en RI -- 30e Conférence sur le Traitement Automatique des Langues Naturelles -- 25e Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (TALN 2023), Jun 2023, Paris, France.

Integrating Translation Memories into Non-Autoregressive Machine Translation

Integrating Translation Memories into Non-Autoregressive Machine Translation

Jitao Xu, Josep Crego, François Yvon.

7th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), Association for Computational Linguistics, May 2023, Dubrovnik, Croatia.

Bilingual Synchronization: Restoring Translational Relationships with Editing Operations

Bilingual Synchronization: Restoring Translational Relationships with Editing Operations

Machine Translation (MT) is usually viewed as a one-shot process that generates the target language equivalent of some source text from scratch. We consider here a more general setting which assumes an initial target sequence, that must be transformed into a valid translation of the source, thereby restoring parallelism between source and target. For this bilingual synchronization task, we consider several architectures (both autoregressive and non-autoregressive) and training regimes, and experiment with multiple practical settings such as simulated interactive MT, translating with Translation Memory (TM) and TM cleaning. Our results suggest that one single generic edit-based system, once fine-tuned, can compare with, or even outperform, dedicated systems specifically trained for these tasks.

Jitao Xu, Josep Crego, François Yvon

The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), Dec 2022, Abou Dabi, United Arab Emirates

Non-Autoregressive Machine Translation with Translation Memories

Non-Autoregressive Machine Translation with Translation Memories

Non-autoregressive machine translation (NAT) has recently made great progress. However, most works to date have focused on standard translation tasks, even though some edit-based NAT models, such as the Levenshtein Transformer (LevT), seem well suited to translate with a Translation Memory (TM). This is the scenario considered here. We first analyze the vanilla LevT model and explain why it does not do well in this setting. We then propose a new variant, TM-LevT, and show how to effectively train this model. By modifying the data presentation and introducing an extra deletion operation, we obtain performance that are on par with an autoregressive approach, while reducing the decoding load. We also … Continued

Jitao Xu, Josep Crego, François Yvon

The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), Dec 2022, Abou Dabi, United Arab Emirates.

Robust Translation of French Live Speech Transcripts

Robust Translation of French Live Speech Transcripts

Despite a narrowed performance gap with direct approaches, cascade solutions, involving automatic speech recognition (ASR) and machine translation (MT) are still largely employed in speech translation (ST). Direct approaches employing a single model to translate the input speech signal suffer from the critical bottleneck of data scarcity. In addition, multiple industry applications display speech transcripts alongside translations, making cascade approaches more realistic and practical. In the context of cascaded simultaneous ST, we propose several solutions to adapt a neural MT network to take as input the transcripts output by an ASR system. Adaptation is achieved by enriching speech transcripts and MT data sets so that they more closely resemble each … Continued

Elise Bertin-Lemée, Guillaume Klein, Josep Crego, Jean Senellart

Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track), Sep 2022, Orlando USA

Latent Group Dropout for Multilingual and Multidomain Machine Translation

Latent Group Dropout for Multilingual and Multidomain Machine Translation

Multidomain and multilingual machine translation often rely on parameter sharing strategies, where large portions of the network are meant to capture the commonalities of the tasks at hand, while smaller parts are reserved to model the peculiarities of a language or a domain. In adapter-based approaches, these strategies are hardcoded in the network architecture, independent of the similarities between tasks. In this work, we propose a new method to better take advantage of these similarities, using a latent-variable model. We also develop new techniques to train this model end-to-end and report experimental results showing that the learned patterns are both meaningful and yield improved translation performance without any increase of … Continued

Minh-Quang Pham, François Yvon, Josep Crego

Findings of the Association for Computational Linguistics: NAACL 2022, Jul 2022, Seattle, United States

Example-based Multilinear Sign Language Generation from a Hierarchical Representation.

Example-based Multilinear Sign Language Generation from a Hierarchical Representation.

Boris Dauriac, Annelies Braffort, Elise Bertin-Lemée.

Jun 2022, Marseille, France.

Multi-Domain Adaptation in Neural Machine Translation with Dynamic Sampling Strategies

Multi-Domain Adaptation in Neural Machine Translation with Dynamic Sampling Strategies

Building effective Neural Machine Translation models often implies accommodating diverse sets of heterogeneous data so as to optimize performance for the domain(s) of interest. Such multi-source / multi-domain adaptation problems are typically approached through instance selection or reweighting strategies, based on a static assessment of the relevance of training instances with respect to the task at hand. In this paper, we study dynamic data selection strategies that are able to automatically re-evaluate the usefulness of data samples and to evolve a data selection policy in the course of training. Based on the results of multiple experiments, we show that such methods constitute a generic framework to automatically and effectively handle … Continued

MinhQuang Pham, Antoine Senellart, Dan Berrebbi, Josep Crego, Jean Senellart

Proceedings of the 23rd Annual Conference of the European Association for Machine Translation , Jun 2022, Ghent, Belgium