Search code examples
pythonnumpyscikit-learnsimilaritycosine-similarity

Compute cosine similarity between 3D numpy array and 2D numpy array


I have a 3D numpy array A of shape (m, n, 300) and a 2D numpy array B of shape (p, 300).

For each of the m (n, 300) matrices in the 3D array, I want to compute its cosine similarity matrix with the 2D numpy array. Currently, I am doing the following:

result = []
for sub_matrix in A:
    result.append(sklearn.metrics.pairwise.cosine_similarity(sub_matrix, B)

The sklearn cosine_similarity function does not support operations with 3D arrays, so is there a more efficient way of computing this that does not involve using the for-loop?


Solution

  • You can reshape to 2D and use the same function -

    from sklearn.metrics.pairwise import cosine_similarity
    
    m,n = A.shape[:2]
    out = cosine_similarity(A.reshape(m*n,-1), B).reshape(m,n,-1)
    

    The output would be 3D after the reshape at the end, which is what you would get after array conversion of result.

    Sample run -

    In [336]: np.random.seed(0)
         ...: A = np.random.rand(5,4,3)
         ...: B = np.random.rand(2,3)
         ...: 
         ...: result = []
         ...: for sub_matrix in A:
         ...:     result.append(cosine_similarity(sub_matrix, B))
         ...: out_org = np.array(result)
         ...: 
         ...: from sklearn.metrics.pairwise import cosine_similarity
         ...: 
         ...: m,n = A.shape[:2]
         ...: out = cosine_similarity(A.reshape(m*n,-1), B).reshape(m,n,-1)
         ...: 
         ...: print np.allclose(out_org, out)
    True