Search code examples
python-3.xnumpycorrelationpearson-correlation

numpy.corrcoef() doubts about return value


I need the Pearson's correlation coefficient between two matrix X,Y. If I run the code corr=numpy.corrcoef(X,Y) my output is a matrix with correlation coefficients. However I need a single value to represent the correlation between two matrix.

I just saw on this kennytm's answer that to have one value I should write numpy.corrcoef(X,Y)[1,0].

This solution works but I don't understand what that numbers inside square brackets mean and why adding them I have as outcome one single value.

I'm interpreting 1 and 0 as limits of the coefficient but what's happen to all the coefficients inside the matrix? What type of operation is computed on them to obtain a single value? If I change numbers inside square brackets for example [1,-1](correlation, anticorrelation) the value of corr change so I'm confused which numbers I should use inside brackets.


Solution

  • numpy.corrcoef returns a matrix containing the correlation coefficient for every pair of rows. So for example, numpy.corrcoef(A,B) for A.shape=(3,3) and B.shape=(3,3) will return a (6,6) matrix since there are 36 row combinations. Note it's a symmetric matrix since it returns both correlations for (e.g.) A[1],B[1] (index [1,4]) and B[1],A[1] (index [4,1]). When you have two 1-D arrays, you get a (2,2) matrix: the correlation of the first array with itself [0,0], the correlation of the first array with the second array [0,1], the correlation of the second array with the first array [1,0] and the correlation of the second array with itself [1,1].

    import numpy as np
    A = np.random.randint(low=0, high=99, size=(3,3))
    B = np.random.randint(low=0, high=99, size=(3,3))
    C = np.corrcoef(A,B)
    print(C[1,4]==np.corrcoef(A[1],B[1])[0,1]) # True
    

    If you want the 2-D correlation (like correlation between images), flatten the 2-D arrays, so you obtain a single row for every array. Then, the element [0,1] or [1,0] of that correlation matrix will be how do the 2-D arrays correlate to each other fully.

    print(np.corrcoef(A.flatten(), B.flatten())[0,1])