Search code examples
pythonnlpgensimword2vec

Gensim 4.2.0 downloader function is missing


I'm using the Gensim package. However, when I want to load the word2vec model, the gensim.downloader function seems not to exist.

w2v = gensim.downloader.load('word2vec-google-news-300')

Got error message:

AttributeError: module 'gensim' has no attribute 'downloader'

I checked the directory of gensim using dir() method and here's what I got:

['__builtins__','__cached__','__doc__','__file__','__loader__','__name__','__package__','__path__','__spec__','__version__','_matutils','corpora','interfaces','logger','logging','matutils','models','parsing','similarities','topic_coherence','utils']

Seems like the downloader method is not in the directory. I wonder if there's another way to download a specific pretrained model with gensim library and also what's wrong with the gensim downloader.

My gensim version is 4.2.0.


Solution

  • If you're following some example code, you should copy its imports & code exactly. I don't think you'll find any docs/examples suggesting to use the gensim.downloader module the way you've attempted.

    More generally: I recommend against using gensim.downloader. It hides the actual sources, local paths, & return types of the data it retrieves, and also runs new code, from the net, that's not part of the Gensim project source-control nor part of versioned Gensim releases. (It's a sketchy software-engineering practice.)

    Instead, download the GoogleNews dataset directly from some host, saving the exact original file(s) to a specific place of your choosing. Examine the downloads to understand their filenames/formats (decompressing if necessary).

    Then use other Gensim methods – such as KeyedVectors.load_word2vec_format() – to load from a specific known local file path, with a returned object of a specific documented type.

    Your code (and your own understanding) will be more clear, robust, & secure.