Search code examples
localizationmachine-learningtranslationglobalizationmachine-translation

Using existing human translations to aid machine translation to a new language


In the past, my company has used human, professional translators to translate our software from English into some 13 languages. It's expensive but the quality is high.

The application we're translating contains industry jargon. It also contains a lot of sentence fragments and single words which, out of context, are unlikely to be correctly translated.

I am wondering if there is a machine translation system or service that could use our existing professionally-generated translations to more accurately create a machine translation into any new language.

If an industry term, phrase or sentence fragment has been translated from en-US to es-AR, pt-BR, cs-CZ, etc., then couldn't those prior translations be used as a hint regarding what the correct word choice should be for some new language? They could be used, in a sense, to triangulate. At worst, they could be used to create a majority voting system (e.g. if 9 of 13 languages translated a phrase to the same thing in the new language, we go with it).

Is anyone aware of a machine translation service that works like this?


Solution

  • Yes. A lot has changed in the decade since 2014.

    Now, as of 2023, there are more than a dozen customizable cloud API providers, many of them self-serve.

    For example, Google Translate launched customization in 2018, after launching neural machine translation in 2016, which made it a lot easier to offer.

    From machinetranslate.org/customisation:

    14 APIs support customisation.

    The type of customization you describe, training with your own parallel data, is the most common. These days it is usually achieved with fine-tuning.

    The basic types of customization that the major machine translation APIs offer are:

    • Fine-tuning - Training a model with parallel data
    • Adaptive - Similar to fine-tuning, but it updates on the fly
    • Glossaries - Defining specific terminology
    • Formality - Formal or informal 2nd person

    There are also other methods of customization offered, like choosing a specific locale, like Canadian French, or a specific domain, like fashion.

    There is basically a trade-off between simplicity and control - a lot of those methods are basically a parameter or a button click, but the engine can't even hope to reflect your style the way that a model fine-tuned on your training data can.

    Warning: Even if an API supports customization, and that API is integrated in your translation management system, that integration does not necessarily support using your customized version of that API. So before you invest time in customization, check your TMS integration to be sure it'll actually let you access your custom machine translation.

    For example, there is no ModernMT integration in XTM. There is a Google Translate integration, but it supports customization only via fine-tuning, not via glossaries.


    Full disclosure: I'm the founder of and a major contributor to Machine Translate, the non-profit foundation making machine translation more accessible to more people, with open information and community.