How to measure the accuracy of a Doc2vec model?

I have a dataset of reviews for different Hotels. I'm trying to find out similar hotels using the reviews of hotels. So, I'm using a Doc2vec algorithm to achieve this.

Is there any way to measure the accuracy of a Doc2Vec model using Gensim, rather than evaluating the results using most_similar() function of Gensim?

Solution

As Doc2Vec (aka the 'Paragraph Vector' algorithm) is an unsupervised method, there are no strictly right or wrong results – just trained models that are better or worse for some downstream task.

How do you, personally, in your own mind, determine if the results are valuable to your project?

You have to capture some of that judgement into a repeatable process – for example, one way might be hand-crafting a list of pairs of hotels that, in your expert human-level judgement, "ought to be more similar" to each other than others, or perhaps in each others' "top N" closest results. Then score the Doc2Vec model against that ideal, compared to other methods (or multiple alternatively-parameterized runs of Doc2Vec).

You might be able to bootstrap some "ought to be more similar" pairs from existing sources of data. For example, maybe two hotels that are in the same chain "ought to be more similar" to each other than some random third hotel. (So, the outside data of their brand-name would guide your evaluation, ideally if you were sure that the brand name didn't leak into the document texts used to train the model.) Or maybe, two hotels that are both geographically & price-wise near each other "ought to be more similar" than some random third.

But there's no standard/automatic idea of "accuracy" for such fuzzy representations on the domain of all possible documents and project goals. You need to develop your own custom evaluations to be able to choose between algorithms, or tune the algorithms.