Search code examples
python-3.xword2vecspacy

Loading pre-trained word embeddings


I am trying to load the pre-trained word2Vec model using the command below but get an Unicode error. Need some help getting to the bottom of it. I googled around but could not find a working solution to this.

python -m spacy init-model en /tmp/google_news_vectors --vectors-loc ~/Downloads/GoogleNews-vectors-negative300.bin.gz


UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 7: invalid start byte

Solution

  • Spacy expects the vectors to be in the text format rather than the binary format:

    https://spacy.io/api/cli#init-model

    For how to convert the binary model, see: https://stackoverflow.com/a/33183634/461847