Search code examples
pythonnumpylinear-algebrapearson-correlation

Compute correlations of several vectors


I have several pairs of vectors (arranged as two matrices) and I want to compute the vector of their pairwise correlation coefficients (or, better yet, angles between them - but since correlation coefficient is its cosine, I am using numpy.corrcoef):

np.array([np.corrcoef(m1[:,i],m2[:,i])[0,1]
          for i in range(m1.shape[1])])

I wonder if there is a way to "vectorize" this, i.e., avoid calling corrcoef several times.


Solution

  • Instead of using np.corrcoef, you can write your own function that does the same thing. The calculation for the correlation coefficient of two vectors is quite simple:

    linear correlation coefficient

    Applying that here:

    def vec_corrcoef(X, Y, axis=1):
        Xm = np.mean(X, axis=axis, keepdims=True)
        Ym = np.mean(Y, axis=axis, keepdims=True)
        N = np.sum((X - Xm) * (Y - Ym), axis=axis)
        D = np.sqrt(np.sum((X - Xm)**2, axis=axis) * np.sum((Y - Ym)**2, axis=axis))
        return N / D
    

    To test:

    m1 = np.random.random((100, 10))
    m2 = np.random.random(m1.shape)
    
    a = vec_corrcoef(m1, m2)
    b = [np.corrcoef(v1, v2)[0, 1] for v1, v2 in zip(m1, m2)]
    
    print(np.allclose(a, b)) # True