Search code examples
matlabperformancematrixcosine-similarity

How to measure the pairwise cosine for a data matrix in MATLAB


Assume there is a data matrix (MATLAB)

X = [0.8147, 0.9134, 0.2785, 0.9649, 0.9572;
     0.9058, 0.6324, 0.5469, 0.1576, 0.4854;
     0.1270, 0.0975, 0.9575, 0.9706, 0.8003]

Each column represent a feature vector for a sample. What is the fastest way to get the pairwise consine similarity measure in X in MATLAB? such as we want to compute the symmetric S is 5X5 matrix, the element in S(3,4) is the consine between the third column and fourth column.

Note: The consine measurment cos(a,b) means the angle bettween vector a and b.


Solution

  • If you have the Statistics Toolbox, use pdist with the 'cosine' option, followed by squareform. Note that:

    • pdist considers rows, not columns, as observations. So you need to transpose the input.
    • The output is 1 minus the cosine similarity. So you need to subtract the result from 1.
    • To get the result in the form of a symmetric matrix apply squareform.

    So, you can use

    S = 1 - squareform(pdist(X.', 'cosine'));