Search code examples
jupyter-notebookgoogle-colaboratorygensimdoc2vec

'ConcatenatedDoc2Vec' object has no attribute 'docvecs'


I am a beginner in Machine Learning and trying Document Embedding for a university project. I work with Google Colab and Jupyter Notebook (via Anaconda). The problem is that my code is perfectly running in Google Colab but if i execute the same code in Jupyter Notebook (via Anaconda) I run into an error with the ConcatenatedDoc2Vec Object.

With this function I build the vector features for a Classifier (e.g. Logistic Regression).

def build_vectors(model, length, vector_size):
    vector = np.zeros((length, vector_size))
    for i in range(0, length):
        prefix = 'tag' + '_' + str(i)
        vector[i] = model.docvecs[prefix]
    return vector

I concatenate two Doc2Vec Models (d2v_dm, d2v_dbow), both are working perfectly trough the whole code and have no problems with the function build_vectors():

d2v_combined = ConcatenatedDoc2Vec([d2v_dm, d2v_dbow])

But if I run the function build_vectors() with the concatenated model:

#Compute combined Vector size
d2v_combined_vector_size = d2v_dm.vector_size + d2v_dbow.vector_size

d2v_combined_vec= build_vectors(d2v_combined, len(X_tagged), d2v_combined_vector_size)

I receive this error (but only if I run this in Jupyter Notebook (via Anaconda) -> no problem with this code in the Notebook in Google Colab):

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [20], in <cell line: 4>()
      1 #Compute combined Vector size
      2 d2v_combined_vector_size = d2v_dm.vector_size + d2v_dbow.vector_size
----> 4 d2v_combined_vec= build_vectors(d2v_combined, len(X_tagged), d2v_combined_vector_size)

Input In [11], in build_vectors(model, length, vector_size)
      3 for i in range(0, length):
      4     prefix = 'tag' + '_' + str(i)
----> 5     vector[i] = model.docvecs[prefix]
      6 return vector

AttributeError: 'ConcatenatedDoc2Vec' object has no attribute 'docvecs'

Since this is mysterious (for me) -> Working in Google Colab but not Anaconda and Juypter Notebook -> and I did not find anything to solve my problem in the web.


Solution

  • If it's working one place, but not the other, you're probably using different versions of the relevant libraries – in this case, gensim.

    Does the following show exactly the same version in both places?

    import gensim
    print(gensim.__version__)
    

    If not, the most immediate workaround would be to make the place where it doesn't work match the place that it does, by force-installing the same explicit version – pip intall gensim==VERSION (where VERSION is the target version) – then ensuring your notebook is restarted to see the change.

    Beware, though, that unless starting from a fresh environment, this could introduce other library-version mismatches!

    Other things to note:

    • Last I looked, Colab was using an over-4-year-old version of Gensim (3.6.0), despite more-recent releases with many fixes & performance improvements. It's often best to stay at or closer-to the latest versions of any key libraries used by your project; this answer describes how to trigger the installation of a more-recent Gensim at Colab. (Though of course, the initial effects of that might be to cause the same breakage in your code, adapted for the older version, at Colab.)
    • In more-recent Gensim versions, the property formerly called docvecs is now called just dv - so some older code erroring this way may only need docvecs replaced with dv to work. (Other tips for migrating older code to the latest Gensim conventions are available at: https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4 )
    • It's unclear where you're pulling the ConcatenatedDoc2Vec class from. A clas of that name exists in some Gensim demo/test code, as a very minimal shim class that was at one time used in attempts to reproduce the results of the original "Paragaph Vector" (aka Doc2Vec) paper. But beware: that's not a usual way to use Doc2Vec, & the class of that name I know barely does anything outside its original narrow purpose.
    • Further, beware that as far as I know, noone has ever reproduced the full claimed performance of the two-kinds-of-doc-vectors-concatenated approach reported in that paper, even using the same data/described-technique/evaluation. The claimed results likely relied on some other undisclosed techniques, or some error in the writeup. So if you're trying to mimic that, don't get too frustrated. And know most uses of Doc2Vec just pick one mode.
    • If you have your own separate reasons for creating concatenated feature-vectors, from multiple algorithms, you should probably write your own code for that, not limited to the peculiar two-modes-of-Doc2Vec code from that one experiment.