I have a pair of word and semantic types of those words. I am trying to compute the relatedness measure between these two words using semantic types, for example: word1=king, type1=man, word2=queen, type2=woman we can use gensim word_vectors.most_similar to get 'queen' from 'king-man+woman'. However, I am looking for similarity measure between vector represented by 'king-man+woman' and 'queen'.
I am looking for a solution to above (or) way to calculate vector that is representative of 'king-man+woman' (and) calculating similarity between two vectors using vector values in gensim (or) way to calculate simple mean of the projection weight vectors(i.e king-man+woman)
You should look at the source code for the gensim most_similar()
method, which is used to propose answers to such analogy questions. Specifically, when you try...
sims = wv_model.most_similar(positive=['king', 'woman'], negative=['man'])
...the top result will (in a sufficiently-trained model) often be 'queen' or similar. So, you can look at the source code to see exactly how it calculates the target combination of wv('king') - wv('man') + wv('woman')
, before searching all known vectors for those closest vectors to that target. See...
...and note that the local variable mean
is the combination of the positive
and negative
values provided.
You might also find other methods there useful, either directly or as models for your own code, such as distances()
...
...or n_similarity()
...