I am learning word embeddings and cosine similarity. My data is composed of two sets of same words but in 2 different languages.
I did two tests:
Should I expect to obtain quite the same results? I noticed that sometimes I have two opposite results. Since I am new on this, I am trying to figure out if I did something wrong or if there is an explanation behind. According to what I have been reading, soft cosine similarity should be more accurate than the usual cosine similarity.
Now, it's time for some data to show you. Unfortunately I can't post a part of my data (the words themselves), but I will try my best to give you the max of information I can give you.
Some other details before:
(1-distance.cosine(data['LANG1_AVG'].iloc[i],data['LANG2_AVG'].iloc[i]))
For the usual cosine similarity I am using the Fast Vector cosine similarity from FastText Multilingual, defined in this way:
@classmethod
def cosine_similarity(cls, vec_a, vec_b):
"""Compute cosine similarity between vec_a and vec_b"""
return np.dot(vec_a, vec_b) / \
(np.linalg.norm(vec_a) * np.linalg.norm(vec_b))
As you will see from the image here, for some words I obtained the same results or quite similar using the two methods. For others I obtained two totally different results. How can I explain this?
After some more additional research, I found a 2014 paper (Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model) that explains when and how it could be useful to use averages of the features, and it explains also what is exactly a soft cosine measure:
Our idea is more general: we propose to modify the manner of calculation of similarity in Vector Space Model taking into account similarity of features. If we apply this idea to the cosine measure, then the “soft cosine measure” is introduced, as opposed to traditional “hard cosine”, which ignores similarity of features. Note that when we consider similarity of each pair of features, it is equivalent to introducing new features in the VSM. Essentially, we have a matrix of similarity between pairs of features and all these features represent new dimensions in the VSM.