Search code examples
tensorflownlpgensimword-embeddingchainer

Non English Word Embedding from English Word Embedding


How can i generate non-english (french , spanish , italian ) word embedding from english word embedding ?

What are the best ways to generate high quality word embedding for non - english words .

Words may include (samsung-galaxy-s9)


Solution

  • For non-english words, you can try to use a bilingual dictionary to translate English words with embedding vectors.

    You need a large corpus to generate high-quality word embeddings. For non-english, you need to add the bilingual constraints into the original w2v loss with the input of bilingual corpora.

    You can regard the compound word as a whole word or split it according to your applications.