Search code examples
pythoncosine-similarity

How to iterate over the dictionary keys to calculate cosine similarity using the values?


I have a dictionary like this:

dict = {in : [0.01, -0.07, 0.09, -0.02], and : [0.2, 0.3, 0.5, 0.6], to : [0.87, 0.98, 0.54, 0.4]}

I want to calculate the cosine similarity between each word for which I have written a function that takes two vectors. First, it will take value for 'in' and 'and', then it should take value for 'in' and 'to' and so on.

I want it to store the result of this in another dictionary, where 'in' should be the key, and the values should be the ones returned after calculating cosine similarity. Similarly, I want dictionaries, for other words as well.

This is my function to calculate cosine similarity:

import math
def cosine_similarity(vec1,vec2):
    sum11, sum12, sum22 = 0, 0, 0
    for i in range(len(vec1)):
        x = vec1[i]; y = vec2[i]
        sum11 += x*x
        sum22 += y*y
        sum12 += x*y
    return sum12/math.sqrt(sum11*sum22)

vec1 and vec2 can be two lists like: [0.01, -0.07, 0.09, -0.02] and [0.2, 0.3, 0.5, 0.6], and it returns a result like: 0.14

How do I compute it in this way for each key and store the results in dictionaries in this way? :

{in : {and : 0.4321, to : 0.218}, and : {in : 0.1245, to : 0.9876}, to : { in : 0.8764, and : 0.123}}

Solution

  • import math
    inputDict = {"in" : [0.01, -0.07, 0.09, -0.02], "and" : [0.2, 0.3, 0.5, 0.6], "to" : [0.87, 0.98, 0.54, 0.4]}
    def cosine_similarity(vec1,vec2):
        sum11, sum12, sum22 = 0, 0, 0
        for i in range(len(vec1)):
            x = vec1[i]; y = vec2[i]
            sum11 += x*x
            sum22 += y*y
            sum12 += x*y
        return sum12/math.sqrt(sum11*sum22)
    
    
    result = {}
    for key,value in inputDict.items():
        temp,tempDict= 0,{}
        for keyC,valueC in inputDict.items():
            if keyC == key:
                continue
            temp = cosine_similarity(value,valueC)
            tempDict[keyC] =temp
        result[key]= tempDict
    
    
    print(result)
    

    output:

    {'in': {'and': 0.14007005254378826, 'to': -0.11279001655020567}, 'and': {'in': 0.14007005254378826, 'to': 0.7719749900051109}, 'to': {'in': -0.11279001655020567, 'and': 0.7719749900051109}}