Search code examples
spacygensim

Load Gensim WordVectors into spacy pipeline


I've generated a Word2Vec model with gensim, bat have a hard time using it in my spacy pipeline.

python -m spacy init vectors de w2v-model-v1.txt.gz path/SpacyModel

creates a model i can load, but the only component is the vectors. I am using the model de_core_news_lg with custom pipeline components and would like to simply replace the standard-vectors with my custom trained vectors


Solution

  • I used the vectors in an existing pipeline by adding each vector to a new vocab.

    from gensim.models import Word2Vec
    from spacy.vocab import Vocab
    
    gensim_model = Word2Vec.load(my_w2vmodel.model)
    vocab = Vocab()
    
    for word in gensim_model.wv.index_to_key:
          vector = gensim_model.wv.get_vector(word)
          vocab.set_vector(word, vector)
    
    nlp.vocab.vectors = vocab.vectors