Search code examples
google-translatelanguage-translationmachine-translation

Translating 1GB of text to English


I'm looking for a language translation API/solution that would fit my use case.

My use case is the following:

  • The data is 1 GB of free unstructured text written mostly in the world's common languages (French, Spanish, German, Russian, Korean). The language of each piece of text is known.
  • We can assume the text is grammatically correct and consists of complete sentences, but contains some uncommon words such as chemical compound names.
  • The text has to be translated to English.
  • The solution must be at least 10x cheaper than Google Translate which charges $20 per 1M characters.
  • I would be willing to trade some of the Google's quality for cost-effectiveness. Google, Yahoo, Microsoft, Yandex, Online-Translator.com are all good enough, just too expensive.

I've got a 16 CPU machine at my disposal so offline translation is an option too.

Any suggestions?


Solution

  • For your volumes, Machine Translation prices range from $3 to $25 per 1M symbols (with some outliers like ModernMT which costs $eu per 1000 words).

    MT Price Comparison

    If you want to trade off a little bit of quality, you may pick what we call "Optimal engines" - one which are within top 5% by performance but have the lowest price.

    Optimal general-purpose MT engines

    You may find more details in our Machine Translation report from July 2018.

    Then, you need to know which engines support your language pairs and deal with their APIs, request limits and quotas.

    You may use Intento API to get a list of engines for your language pairs. Then, you may use this API in the async mode, then Intento takes care of all the limits. However I am not sure it will deal with 1G file, but you're welcome to try.

    To avoid tinkering with the API requests, I would suggest using the CLI.