Search code examples

How to find out the quality of machine translation systems?

I know that there are various metrics for measuring the quality of machine translation systems, for example:

  • Bleu
  • Lepor

Are there somewhere in the public domain metric results for popular translation systems? For example, such as:

  • Google translate
  • DeepL
  • Yandex Translate
  • Microsoft translate
  • Promt
  • Apertium
  • Openlogs
  • Papago
  • Fanyi Baidu


  • Machine translation quality is annually evaluated at the Conference on Machine Translation. Most of the evaluated systems are experimental systems from universities, but most of the systems you mention participate as well. You the results of last year's human evaluation in Table 11 on page 24 of the conference findings.

    Most of the systems you mentioned participate anonymously under acronyms online-?, but you can often guess which system is which.