I have already trained gensim doc2Vec model, which is finding most similar documents to an unknown one.
Now I need to find the similarity value between two unknown documents (which were not in the training data, so they can not be referenced by doc id)
d2v_model = doc2vec.Doc2Vec.load(model_file)
string1 = 'this is some random paragraph'
string2 = 'this is another random paragraph'
vec1 = d2v_model.infer_vector(string1.split())
vec2 = d2v_model.infer_vector(string2.split())
in the code above vec1 and vec2 are successfully initialized to some values and of size - 'vector_size'
now looking through the gensim api and examples I could not find method that works for me, all of them are expecting TaggedDocument
Can I compare the feature vectors value by value and if they are closer => the texts are more similar?
Hello just In case someone is interested, to do this you just need the cosine distance between the two vectors.
I found that most people are using 'spatial' for this pourpose
Here is a small code sniped that should work pretty well if you already have trained doc2vec
from gensim.models import doc2vec
from scipy import spatial
d2v_model = doc2vec.Doc2Vec.load(model_file)
fisrt_text = '..'
second_text = '..'
vec1 = d2v_model.infer_vector(fisrt_text.split())
vec2 = d2v_model.infer_vector(second_text.split())
cos_distance = spatial.distance.cosine(vec1, vec2)
# cos_distance indicates how much the two texts differ from each other:
# higher values mean more distant (i.e. different) texts