I have a 3D numpy array A of shape (m, n, 300) and a 2D numpy array B of shape (p, 300).
For each of the m (n, 300) matrices in the 3D array, I want to compute its cosine similarity matrix with the 2D numpy array. Currently, I am doing the following:
result = []
for sub_matrix in A:
result.append(sklearn.metrics.pairwise.cosine_similarity(sub_matrix, B)
The sklearn cosine_similarity function does not support operations with 3D arrays, so is there a more efficient way of computing this that does not involve using the for-loop?
You can reshape to 2D
and use the same function -
from sklearn.metrics.pairwise import cosine_similarity
m,n = A.shape[:2]
out = cosine_similarity(A.reshape(m*n,-1), B).reshape(m,n,-1)
The output would be 3D
after the reshape at the end, which is what you would get after array conversion of result
.
Sample run -
In [336]: np.random.seed(0)
...: A = np.random.rand(5,4,3)
...: B = np.random.rand(2,3)
...:
...: result = []
...: for sub_matrix in A:
...: result.append(cosine_similarity(sub_matrix, B))
...: out_org = np.array(result)
...:
...: from sklearn.metrics.pairwise import cosine_similarity
...:
...: m,n = A.shape[:2]
...: out = cosine_similarity(A.reshape(m*n,-1), B).reshape(m,n,-1)
...:
...: print np.allclose(out_org, out)
True