Search code examples
pythonarraysnumpyscipynumpy-ndarray

Self differencing function for 2D numpy array (correlation?)


I have a 2D array mat of shape ~ (3000, 5). Now I want to subtract every column from the others and ignore the self subtraction. This is what I have right now and what works. I just want a more efficient way of doing this, like a self correlation or something. Is there a numpy or scipy function that allows me to do this with less steps instead of the usual for loop?

agreement = np.full((mat.shape[1], mat.shape[1]), [np.nan]*mat.shape[0])
for i in range(agreement.shape[0] - 1):
    for j in range(i+1, agreement.shape[1]):
        A = mat[:, i]
        B = mat[:, j]
        diff = A - B
        agreement[i][j] = diff

The result is a 2d array agreement that looks something like this:

array([[nan [......] [......] [......] [......]]
       [nan nan [......] [......] [......]]
       [nan nan nan [......] [......]]
       [nan nan nan nan [......]]
       [nan nan nan nan nan]])

Each of the [......] represents the diff of those 2 columns and is thus of shape (3000,1) and is not an accurate representation, just to give an example. Each of the nan represents the array [np.nan]*mat.shape[0]


Solution

  • np.corrcoef(mat, rowvar=False) will give you real correlation matrix for your 5 columns

    To make array of diffs you describe, use:

    agreement = (
        np.stack([mat] * mat.shape[1], axis=1).T - 
        np.stack([mat.T] * mat.shape[1], axis=0)
    )
    

    in this array agreement[a, b] = mat[:, a] - mat[:, b]

    but there will be no nan in low triangle of matrix