Search code examples
pythonnumpyscipyscikit-learnsparse-matrix

Is there a way to calculate similarity between one sparse vector and matrix?


How to calculate the (e.g., cosine) similarity between one sparse vector and a matrix (i.e., an array of sparse vectors)?
Is this possible using scikit-learn, scipy, numpy, etc.? If possible, the similarity metric should be easily changeable.


Solution

  • If you are interested in calculating the cosine similarity, it can be done by using cosine-similarity metric functionality present in sklearn which returns the distance matrix if the input is in matrix form.

    Illustration:

    import numpy as np
    from sklearn.metrics.pairwise import pairwise_distances
    
    mat_1 = np.matrix([[1,2,3],[3,4,5]])
    vec_1 = (2, 3, 5)
    # Make sure the dimensions of the vector and matrix are equal
    >>>print pairwise_distances(mat_1, vec_1, metric = 'cosine')
    [[ 0.00282354]
    [ 0.01351234]]
    

    Note: If you intend on changing the distance metrics, you can do so by placing the appropriate names to the metric parameter. However, if your input contains sparse matrix, you can only use the metrics - ['cityblock', 'cosine', 'euclidean', 'l1', 'l2', 'manhattan'] as others aren't supported to handle the sparse metric inputs.


    Docs you can refer further : Pairwise metrics, Affinities and Kernels