Search code examples
textdata-miningcosine-similarity

can someone show me how to work out simple cosine similarity graphically


Can someone shoe me how to work out cosine similarity please? I understand that someone has answered a similar question beforesimilar question link but i do not understand how the end result was reached.


Solution

  • As cosine similarity equation is

    enter image description here

    And in the similar question link the chosen answer has computed two vectors standing for word counts for two sentences.

    A = (2,1,0,2,0,1,1,1)
    B = (2,1,1,1,1,0,1,1)
    

    So we can compute the dot product of A and B is

    dotProduct(A,B) = 2x2 + 1x1 + 0x1 + 2x1 + 0x1 + 1x0 + 1x1 + 1x1 = 9
    

    and magnitude of A and B are

    magnitude(A) = sqrt(2x2 + 1x1 + 0x0 + 2x2 + 0x0 + 1x1 + 1x1 + 1x1) = 3.464
    magnitude(B) = sqrt(2x2 + 1x1 + 1x1 + 1x1 + 1x1 + 0x0 + 1x1 + 1x1) = 3.162
    

    Then we could apply the equation:

    similarity = cos(theta) = dotProduct(A,B) / (magnitude(A) x magnitude(B))
                            = 9 / (3.464 x 3.162)
                            = 0.822
    

    where theta is the angle between vector A and vector B