Search code examples
pythonnumpygensimdoc2vecpre-trained-model

Unable to load pre-trained gensim Doc2Vec from publication data


I want to use an already trained Doc2Vec from a published paper.

Paper

Whalen, R., Lungeanu, A., DeChurch, L., & Contractor, N. (2020). Patent Similarity Data and Innovation Metrics. Journal of Empirical Legal Studies, 17(3), 615–639. https://doi.org/10.1111/jels.12261

Code

https://github.com/ryanwhalen/patent_similarity_data

Data

https://zenodo.org/record/3552078#.YeWkFvgxmUk

However, when trying to load the model (patent_doc2v_10e.model) an error is raised. Edit: The file can be downloaded from the data repository (link above). I am not the author of the paper nor the creator of the model.

from gensim.models.doc2vec import Doc2Vec
model = Doc2Vec.load("patent_doc2v_10e.model")


FileNotFoundError: [Errno 2] No such file or directory: 'patent_doc2v_10e.model.trainables.syn1neg.npy'

Am I missing files or do I have to load the model in other ways?


Solution

  • Where did the file patent_doc2v_10e.model come from?

    If trying to load that file, it generates such an error about another file with the name patent_doc2v_10e.model.trainables.syn1neg.npy, then that other file is a necessary part of the full model that should have been created alongside patent_doc2v_10e.model when that patent_doc2v_10e.model file was first .save()-persisted to disk.

    You'll need to go back to where patent_doc2v_10e.model was created, & find the extra missing patent_doc2v_10e.model.trainables.syn1neg.npy file (& possibly others also starting patent_doc2v_10e.model…). All such files created at the same .save() must be kept/moved together, at the same filesystem path, for any future .load() to succeed.

    (Additionally, if you are training these yourself from original data, I'd suggest being sure to use a current version of Gensim. Only older pre-4.0 versions will create any save files with trainables in the name.)