Search code examples
javanlpword2vecpolish

Is there any Polish implementation for similar words in word2vec?


I found GoogleNews-vectors-negative300.bin library, but only for ENG words, Is there any Polish implementation for similar words in word2vec?

I have already tried using cc.pl.300.bin and NKJP-PodkorpusMilionowy libraries...

    public  Word2Vec getWord2Vec() {
        File gModel = new File("C:/Users/user/Desktop/GoogleNews-vectors-negative300.bin.gz");
        return WordVectorSerializer.readWord2VecModel(gModel);
    }

Solution

  • The file...

    https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.pl.vec

    ...as linked from...

    https://fasttext.cc/docs/en/pretrained-vectors.html

    ...may work for you, if your library loads the simple 'text' format for exchanging word-vectors. (It's not in the Facebook FastText-specific binary format, as your cc.pl.300.bin file was.)