Search code examples
pythonnumpy

calculate pearson correlation of each rows in 2D numpy array (n,m)


a = np.array([[1,2,4],[3,6,2],[3,4,7],[9,7,7],[6,3,1],[3,5,9]])

b = np.array([[4,5,2],[9,2,5],[1,5,6],[4,5,6],[1,2,6],[6,4,3]])

a = array([[1, 2, 4],
       [3, 6, 2],
       [3, 4, 7],
       [9, 7, 7],
       [6, 3, 1],
       [3, 5, 9]])

b = array([[4, 5, 2],
       [9, 2, 5],
       [1, 5, 6],
       [4, 5, 6],
       [1, 2, 6],
       [6, 4, 3]])

I would like to calculate the pearson correlation coefficient between the first row of a and first row of b, the second row of a and second row b and so on for each rows to follow.

desired out put should be 1D array:

array([__ , __ , __)

column wise i can do it as below:

corr = np.corrcoef(a.T, b.T).diagonal(a.shape[1])

Output:

array([-0.2324843 , -0.03631365, -0.18057878])

UPDATE

Though i accepted the answer below but there is this alternative solution to the question and also addresses zero division error issues:

def corr2_coeff(A, B):
  # Rowwise mean of input arrays & subtract from input arrays themeselves
  A_mA = A - A.mean(1)[:, None]
  B_mB = B - B.mean(1)[:, None]

  # Sum of squares across rows
  ssA = (A_mA**2).sum(1)
  ssB = (B_mB**2).sum(1)
  
  deno = np.sqrt(np.dot(ssA[:, None],ssB[None])) + 0.00000000000001

  # Finally get corr coeff
  return np.dot(A_mA, B_mB.T) / deno

Solution

  • I would do the calculation presented in the Q as:

    [np.corrcoef(x, y)[0,1] for x, y in zip(a.T, b.T)]
    

    with the result:

    [-0.23248430170889073, -0.03631365196012811, -0.18057877962865382]
    

    The row-by_row correlations are then obtained by simply removing the transpose:

    [np.corrcoef(x,y)[0,1] for x, y in zip(a, b)]
    

    with the result

    [-0.7857142857142855,
     -0.661143091251952,
     0.8170571691028832,
     -0.8660254037844387,
     -0.9011271137791659,
     -0.9285714285714285]
    

    If you want the solution using the approach in Q., it can be done using:

    np.corrcoef(a, b).diagonal(a.shape[0])
    

    OR

    np.corrcoef(a.T, b.T, rowvar=False).diagonal(a.shape[0])