numpy scikit-learn scipy correlation covariance

Correlation 2D vector fields

Having multiple 2D flow maps, ie vector fields how would one find statistical correlation between pairs of these?

The problem:

One should not (?) resize 2 flow maps of shape (x,y,2): flow1, flow2 to 1D vectors and run

np.correlation_coeff(flow1.reshape(1,-1),flow2.reshape(1,-1))

since x,y entries are connected.

Plotting yields, for visualization purposes only:

flow1: flow2:

I am thinking about comparing magnitudes and direction.

How would one ideally compare those (cosinus-distance, ...)?
How would one compare covariance between vector fields?

Edit:

I am aware that np.corrcoef(flow1.reshape(2,-1), flow2.reshape(2,-1)) would return a 4,4 correlation coefficient matrix but find it unintuitive to interpret.

Solution

For some measures of similarity it may indeed be desirable to take the spatial structure of the domain into account. But a coefficient of correlation does not do that: it is invariant under any permutations of the domain. For example, the correlation between (0, 1, 2, 3, 4) and (1, 2, 4, 8, 16) is the same as between (1, 4, 2, 0, 3) and (2, 16, 4, 1, 8) where both arrays were reshuffled in the same way.

So, the coefficient of correlation would be obtained by:

Centering both arrays, i.e., subtracting their mean. Say, we get FC1 and FC2.
Taking the inner product FC1 and FC2 : this is just the sum of the products of matching entries.
Dividing by the square roots of the inner products FC1*FC1 and FC2*FC2.

Example:

flow1 = np.random.uniform(size=(10, 10, 2))     # the 3rd dimension is for the components
flow2 = flow1 + np.random.uniform(size=(10, 10, 2))
flow1_centered = flow1 - np.mean(flow1, axis=(0, 1))
flow2_centered = flow2 - np.mean(flow2, axis=(0, 1))
inner_product = np.sum(flow1_centered*flow2_centered)
r = inner_product/np.sqrt(np.sum(flow1_centered**2) * np.sum(flow2_centered**2))

Here the flows have some positive correlation because I included flow2 in flow1. Specifically, it's a number around 1/sqrt(2), subject to random noise.

If this is not what you want, then you don't want the coefficient of correlation but some other measure of similarity.