Search code examples
pythonnumpytheanocosine-similaritytrigonometry

Cosine similarity in Theano


What is the easiest way to compute cosine similarity with numpy and theano? Vectors given as numpy arrays.

I've tried to calculate cosine similarity matrix just using numpy, and it works maddeningly slow. However, I am absolutely new to theano, but suppose that this library may help me to build my cosine similarity matrix.

Well, help! :)


Solution

  • Here's a post about cosine similarity in Python: Cosine Similarity between 2 Number Lists.

    I rewrote this answer in Numpy and Theano:

    def cos_sim_numpy(v1, v2):
        numerator = sum(v1*v2)
        denominator = math.sqrt(sum(v1**2)*sum(v2**2))
        return numerator/denominator
    
    def compile_cos_sim_theano():
        v1 = theano.tensor.vector(dtype=theano.config.floatX)
        v2 = theano.tensor.vector(dtype=theano.config.floatX)
        numerator = theano.tensor.sum(v1*v2)
        denominator = theano.tensor.sqrt(theano.tensor.sum(v1**2)*theano.tensor.sum(v2**2))
        return theano.function([v1, v2], numerator/denominator)
    
    cos_sim_theano_fn = compile_cos_sim_theano()
    
    v1 = numpy.asarray([3,45,7,2], dtype=np.float32)
    v2 = numpy.asarray([2,54,13,15], dtype=np.float32)
    
    print cos_sim_theano_fn(v1, v2), cos_sim_numpy(v1, v2)
    
    Output: 0.972284251712 0.972284251712