Search code examples
pythongensimword2vec

Why doesn't gensim's Word2Vec recognize 'compute_loss' keyword?


According to the gensim.models.Word2Vec API reference, "compute_loss" is a valid keyword. However, I get an error that says it's an unexpected keyword.

UPDATE:

The Word2Vec class on GitHub does have the 'compute_loss' keyword, but my local library does not. I see that the gensim documentation and library deviate from each other. I found that the win-64/gensim-2.2.0-np113py35_0.tar.bz2-file in conda repository is not up to date.

However after uninstalling gensim with conda, pip install gensim did not change anything as it still doesn't work.

Apparently, the source on GitHub and the distributed library are different, but the tutorial seems to assume code is as on GitHub.

/END OF UPDATE

I followed and downloaded the tutorial notebook on Word2Vec.

In input [25], first cell after "Training Loss Computation" headline, I get an error in the Word2Vec class' initializer.

Input:

# instantiating and training the Word2Vec model
model_with_loss = gensim.models.Word2Vec(sentences, min_count=1, 
compute_loss=True, hs=0, sg=1, seed=42)

# getting the training loss value
training_loss = model_with_loss.get_latest_training_loss()
print(training_loss)

Output:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-25-c2933abf4b08> in <module>()
      1 # instantiating and training the Word2Vec model
----> 2 model_with_loss = gensim.models.Word2Vec(sentences, min_count=1, compute_loss=True, hs=0, sg=1, seed=42)
      3 
      4 # getting the training loss value
      5 training_loss = model_with_loss.get_latest_training_loss()

TypeError: __init__() got an unexpected keyword argument 'compute_loss'

I have gensim 2.2.0 installed via conda and have a new new clone from the gensim repository (with the tutorial notebook). I'm using 64-bit Python 3.5.3 on windows 10. (Anaconda)

I've tried to search for others with same encounter, but I haven't been successful.

Do you know the reason for this, and how to fix this? Apparently, the source on GitHub and the distributed library are different, but the tutorial seems to assume code is as on GitHub.

I've also previously posted the question in the official mailing list.


Solution

  • UPDATE: compute_loss was added in version 2.3.0, on July 25th. /UPDATE

    The notebook referenced in the question is on the develop branch. The master branch has a notebook that is consistent with the latest distribution.

    The compute_loss parameter was added in this commit, June 19. The last upload to PYPI was June 21, only two days later. (As of today). The compute_loss is not included in the distribution. (Last commit in v2.2.0 is this.)

    I assume that the solution is to wait for the next version of gensim, and download code from repository in the mean time.

    However, this might cause challenges to get gensim FAST version to work, at least on Windows. See Using Gensim shows "Slow version of gensim.models.doc2vec being used".

    How to install gensim from GitHub is explained in their install documentation.