I have a 2D array mat
of shape ~ (3000, 5). Now I want to subtract every column from the others and ignore the self subtraction. This is what I have right now and what works. I just want a more efficient way of doing this, like a self correlation or something. Is there a numpy or scipy function that allows me to do this with less steps instead of the usual for
loop?
agreement = np.full((mat.shape[1], mat.shape[1]), [np.nan]*mat.shape[0])
for i in range(agreement.shape[0] - 1):
for j in range(i+1, agreement.shape[1]):
A = mat[:, i]
B = mat[:, j]
diff = A - B
agreement[i][j] = diff
The result is a 2d array agreement
that looks something like this:
array([[nan [......] [......] [......] [......]]
[nan nan [......] [......] [......]]
[nan nan nan [......] [......]]
[nan nan nan nan [......]]
[nan nan nan nan nan]])
Each of the [......]
represents the diff
of those 2 columns and is thus of shape (3000,1)
and is not an accurate representation, just to give an example. Each of the nan
represents the array [np.nan]*mat.shape[0]
np.corrcoef(mat, rowvar=False)
will give you real correlation matrix for your 5 columns
To make array of diffs you describe, use:
agreement = (
np.stack([mat] * mat.shape[1], axis=1).T -
np.stack([mat.T] * mat.shape[1], axis=0)
)
in this array
agreement[a, b] = mat[:, a] - mat[:, b]
but there will be no nan
in low triangle of matrix