Search code examples
rword2vecdoc2vec

R package 'word2vec' doc2vec function


I am a student (computer science). This is my first question in stackoverflow. I really would appreciate your help! (The package I am referring to is called 'word2vec', thats why the tags/title are a bit confusing to choose.)

In the description of the doc2vec function (here https://cran.r-project.org/web/packages/word2vec/word2vec.pdf) it says:

Document vectors are the sum of the vectors of the words which are part of the document standardised by the scale of the vector space. This scale is the sqrt of the average inner product of the vector elements.

From what I understood, doc2vec takes one additional vector for every paragraph. Which, in my eyes, seems to be different than the above description.

Is my understanding of doc2vec correct, or close enough? And: Does the cited implementation work like the doc2vec-algorithm?


Solution

  • Many people use "Doc2Vec" to refer to the word2vec-like algorithm introduced by a paper titled Distributed Representation of Sentences and Documents (by Le & Mikolov). That paper calls the algorithm 'Paragraph Vector', without using the name 'Doc2Vec', and indeed introduces an extra vector per document, like you describe. (That is, the doc-vector is trained a bit like a 'floating' pseudoword-vector, that contributes to to the input 'context' for every training prediction in that document.)

    I'm not familiar with R or that R word2vec package, but from the docs you forwarded, it does not sound like that doc2vec function implements the 'Paragraph Vector' algorithm that others call 'Doc2Vec'. In particular:

    • 'Paragraph Vector' doc-vectors are not a simple sum-of-word-vectors

    • 'Paragraph Vector' doc-vectors are created by a separate word2vec-like training process that co-creates any necessary word-vectors simultaneous with that training. Specifically: that process does not normally use as input some other pre-trained word-vectors, nor create word-vectors as a 1st step. (And further: the PV-DBOW option of the 'Paragraph Vector' paper doesn't create traditional word-vectors at all.)

    It appears that function is poorly-named, and if you need to use the actual 'Paragraph Vector' algorithm, you will need to look elsewhere.