Search code examples
python-3.xword-embeddingdoc2vec

semantic and syntactic performance of Doc2vec model


I am trying to check the semantic and syntactic performance of a doc2vec model- doc2vec_model.accuracy(questions-words), but it doesnt seem to function since models.deprecated.doc2vec – Deep learning with paragraph2vec, says it has been deprecated since version 3.3.0 in the gensim package.It gives this error message

AttributeError: 'Doc2Vec' object has no attribute 'accuracy'

Though it works with word2vec model well, is there any way I can get it done apart from doc2vec_model.accuracy(questions-words)? or it's impossible?


Solution

  • A few notes:

    That 'accuracy()' test is only a test of word-vectors on analogy problems – an easy evaluation to run, used in a number of papers, but not the final authority on whether a set of word-vectors is better than others for a particular purpose. (When I've had a project-specific scoring method, sometimes the word-vectors that score best on project-specific goals don't score best on those analogies – especially if the word-vectors are being used for a classification or information-retrieval task.)

    Further, the popular and fast PV-DBOW Doc2Vec mode (dm=0 in gensim) doesn't train word-vectors at all, unless you add another setting (dbow_words=1). Such untrained word-vectors will be in random locations, scoring awfully on the analogies-accuracy.

    But, using either PV-DM (dm=1) mode, or adding dbow_words=1 to PV-DBOW, will get word-vectors from Doc2Vec, and you might still want to run the analogies test. Fortunately, analogy-evaluation options have been retained & even expanded on the KeyedVectors object that's held in the Doc2Vec wv property. You can call the old accuracy() method there:

    https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.Word2VecKeyedVectors.accuracy

    But there's also a slightly-different scoring evaluate_word_pairs():

    https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.WordEmbeddingsKeyedVectors.evaluate_word_pairs

    (And in the 4.0.0 release there'll be a [evaluate_word_analogies()][1] which replaces `accuracy().)