Search code examples
pythonarraysnumpycorrelation

How to find Corelation between Multidimensional numpy arrays


I have a trained model. For this model, the input is a Numpy array with shape (245,128,128,13). The output is a Numpy array with the shape (245,128,128,1).

245 represents number of samples.

(128,128) represents the image resolution.

13 in input array represents the different parameters considered as input.

1 in output array represents the model predicted output array.

Each sample consists of 13 (128,128) sized images in input array.

The output array consists of only one (128,128) sized image for each 245 samples.

Now I would like to find the correlation between each of the 13 parameters in input array with the output array. I require 13 values as the answer which represent the correlation values of these 13 parameters with the output value.

How do I find it.?


Solution

  • You can just flatten the arrays before computing the correlation

    scipy.stats.stats.pearson(inp[:,:,:,5].flatten(), out.flatten())
    

    for example computes pearson's correlation coefficient (and associated p-value) between 5th channel and output.

    A bit overkill method could also be

    np.corrcoef([inp[:,:,:,i].flatten() for i in range(13)], out.flatten())
    

    Which computes all 14×14 correlation coefficients between all 14 vectors that are the 13 inputs and the output. So, last column of that (or last line: it is symmetric) are the correlation coefficients between each input and output. It's overkill because it also computes, in addition to those 13 correlations you want, all the 13×12/2 = 78 correlation coefficient you didn't ask for between each input.