Search code examples
nlptranslation

NLP translation giving me sentence translations instead of word translation


I trained a Transformers using a Portuguese-English dataset from http://www.manythings.org/anki/. This is a parallelized sentence dataset.

After training, I tried translating the word "doente" which should've translated to "sick" but it instead I got "I feel sick".

Any ideas of how do I get just the word sick ?

Am I training my model with the wrong dataset ? sentence based instead of word based ?

tks in advance


Solution

  • Machine translation generally works on sentences, as the context in which a word is used changes its meaning. There is no point in word-for-word translation.

    So what will have happened is that your word doente usually occurs in sentences whose English translation is I feel sick; that is the minimal context. As these are all just characters to the machine, there is no 'understanding' that only the sick part corresponds to doente from the point of view of a human being.

    If you want to translate words, use a bilingual dictionary; I doubt that there are word-based models for this, as decades of research in machine translation have shown that you need larger chunks of language for translating than just words.