Search code examples
javamathvectorsimilarity

Measure similarity between 2 vectors


I'm trying to calculate the similarity between 2 sentences, so I have 2 set of words, each represents a sentence, and a function (F) that receives two words and return the similarity between them in a way .. Image 1

In Image 1, the black circles are the words of sentence A, and the red squares represent sentence B. For each time function (F) receives 2 words, it returns a value between 0.0 and 1.0, for example: the first word in A and the third of B have 0.3 similarity score. I have used M x N comparison because the order of the two sentences often not the same, and also number of words.

My questions are:

  1. After getting all M x N comparison scores, how could i get a final score between 0.0 and 1.0 that denotes the similarity of the two sentences or lists? "since the length of the two sentences is not always equal".

  2. If this approach is not right, what's the alternative?


Solution

  • I have got the score for the above chart in the following way:

    1- When I get 2 lists, the shorter one will be on the left side.

    2- for each word on the left side, I have extracted the max (1.0 in our example) then divide it by the number of the words on the right side to get a score for the word.

    3- finally, I have summed up the scores for the words to get the final score then divide it by the number of the words on the left. (1 + 0.8)/2 = 0.4

    This type of calculation depends on the nature of the relations since each word could has relations above of ZERO, but if each word has just one relation above of ZERO with the other relations we shouldn't divide the final scores by the number of the words in the final step.