Papers and publications

icon-search

企業情報

With more than 50 years of experience in translation technologies, SYSTRAN has pioneered the greatest innovations in the field, including the first web-based translation portals and the first neural translation engines combining artificial intelligence and neural networks for businesses and public organizations.

SYSTRAN provides business users with advanced and secure automated translation solutions in various areas such as: global collaboration, multilingual content production, customer support, electronic investigation, Big Data analysis, e-commerce, etc. SYSTRAN offers a tailor-made solution with an open and scalable architecture that enables seamless integration into existing third-party applications and IT infrastructures.

Fast Approximate String Matching with Suffix Arrays and A* Parsing [PDF]

Fast Approximate String Matching with Suffix Arrays and A* Parsing [PDF]

We present a novel exact solution to the approximate string matching problem in the context of translation memories, where a text segment has to be matched against a large corpus, while allowing for errors. We use suffix arrays to detect exact n-gram matches, A* search heuristics to discard matches and A* parsing to validate candidate segments. The method outperforms the canonical baseline by a factor of 100, with average lookup times of 4.3–247ms for a segment in a realistic scenario.

Philipp Koehn, Jean Senellart

AMTA, October 2010.

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems [PDF]

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems [PDF]

This report describes both of SYSTRAN’s Chinese-English and English-Chinese machine translation systems that participated in the CWMT2009 machine translation evaluation tasks. The base systems are SYSTRAN rule-based machine translation systems, augmented with various statistical techniques. Based on the translations of the rule-based systems, we perform statistical post-editing with the provided bilingual and monolingual training corpora. In this report, we describe the technology behind the systems, the training data, and finally the evaluation results in the CWMT2009 evaluation. Our primary systems were top-ranked in the evaluation tasks.

Jin Yang, Satoshi Enoue, Jean Senellart, Tristan Croiset

November 2009, CWMT

Selective addition of corpus-extracted phrasal lexical rules to a rule-based machine translation system [PDF]

Selective addition of corpus-extracted phrasal lexical rules to a rule-based machine translation system [PDF]

In this work, we show how an existing rule-based, general-purpose machine translation system may be improved and adapted automatically to a given domain, whenever parallel corpora are available. We perform this adaptation by extracting dictionary entries from the parallel data. From this initial set, the application of these rules is tested against the baseline performance. Rules are then pruned depending on sentence-level improvements and deteriorations, as evaluated by an automatic string-based metric. Experiments using the Europarl dataset show a 3% absolute improvement in BLEU over the original rule-based system.

Loic Dugast, Jean Senellart, Philipp Koehn

MT Summit, August 2009.

SMT and SPE Machine Translation Systems for WMT'09 [PDF]

SMT and SPE Machine Translation Systems for WMT'09 [PDF]

This paper describes the development of several machine translation systems for the 2009 WMT shared task evaluation. We only consider the translation between French and English. We describe a statistical system based on the Moses decoder and a statistical post-editing system using SYSTRAN’s rule-based system. We also investigated techniques to automatically extract additional bilingual texts from comparable corpora.

Holger Schwenk, Sadaf Abdul Rauf, Loic Barrault, Jean Senellart

Mars 2009

Statistical Post Editing and Dictionary Extraction: SYSTRAN/Edinburgh submissions for ACL-WMT2009 [PDF]

Statistical Post Editing and Dictionary Extraction: SYSTRAN/Edinburgh submissions for ACL-WMT2009 [PDF]

Abstract: We describe here the two Systran/University of Edinburgh submissions for WMT2009. They involve a statistical post-editing model with a particular handling of named entities (English to French and German to English) and the extraction of phrasal rules (English to French).

Loïc Dugast, Jean Senellart, Philipp Koehn

March 2009

Can we Relearn an RBMT System? [PDF]

Can we Relearn an RBMT System? [PDF]

This paper describes SYSTRAN submissions for the shared task of the third Workshop on Statistical Machine Translation at ACL. Our main contribution consists in a French-English statistical model trained without the use of any human-translated parallel corpus. In substitution, we translated a monolingual corpus with SYSTRAN rule-based translation engine to produce the parallel corpus. The results are provided herein, along with a measure of error analysis.

Loïc Dugast, Jean Senellart, Philipp Koehn

June 2008.

SYSTRAN Purely Neural MT Engines for WMT2017

SYSTRAN Purely Neural MT Engines for WMT2017

This paper describes SYSTRAN’s systems submitted to the WMT 2017 shared news translation task for English-German, in both translation directions. Our systems are built using OpenNMT1, an opensource neural machine translation system, implementing sequence-to-sequence models with LSTM encoder/decoders and attention. We experimented using monolingual data automatically back-translated. Our resulting models are further hyperspecialised with an adaptation technique that finely tunes models according to the evaluation test sentences.

Yongchao Deng, Jungi Kim, Guillaume Klein, Catherine Kobus, Natalia Segal, Christophe Servan, Bo Wang, Dakun Zhang, Josep Crego, Jean Senellart

Published in "Proceedings of the Second Conference on Machine Translation", pages 265--270, Association for Computational Linguistics, 2017, Copenhagen, Denmark

SYSTRAN Translation Stylesheets: Machine Translation driven by XSLT [PDF]

SYSTRAN Translation Stylesheets: Machine Translation driven by XSLT [PDF]

XSL Transformation stylesheets are usually used to transform a document described in an XML formalism into another XML formalism, to modify an XML document, or to publish content stored into an XML document to a publishing format (XSL-FO, (X)HTML…). SYSTRAN Translation Stylesheets (STS) use XSLT to drive and control the machine translation of XML documents (native XML document formats or XML representations — such as XLIFF — of other kinds of document formats).

Pierre Senellart, Jean Senellart

September 2005

SYSTRAN Intuitive Coding Technology [PDF]

SYSTRAN Intuitive Coding Technology [PDF]

Customizing a general-purpose MT system is an effective way to improve machine translation quality for specific usages. Building a user-specific dictionary is the first and most important step in the customization process. An intuitive dictionary-coding tool was developed and is now utilized to allow the user to build user dictionaries easily and intelligently. SYSTRAN’s innovative and proprietary IntuitiveCoding® technology is the engine powering this tool. It is comprised of various components: massive linguistic resources, a morphological analyzer, a statistical guesser, finite-state automaton, and a context-free grammar. Methodologically, IntuitiveCoding® is also a cross-application approach for high quality dictionary building in terminology import and exchange. This paper describes the various components and … Continued

Jean Senellart, Jin Yang, Anabel Rebollo

MT Summit IX; September 22-26, 2003

SYSTRAN Review Manager [PDF]

SYSTRAN Review Manager [PDF]

The SYSTRAN Review Manager (SRM) is one of the components that comprise the SYSTRAN Linguistics Platform (SLP), a comprehensive enterprise solution for managing MT customization and localization projects. The SRM is a productivity tool used for the review, quality assessment and maintenance of linguistic resources combined with a SYSTRAN solution. The SRM is used in-house by SYSTRAN’s development team and is also licensed to corporate customers as it addresses leading linguistic challenges, such as terminology and homographs, which makes it a key component of the QA process. Extremely flexible, the SRM adapts to localization and MT customization projects from small to large-scale. Its Web-based interface and multi-user architecture enable a … Continued

Jean-Cédric Costa, Christiane Panissod

MT Summit IX; September 22-26, 2003.