Search code examples
python-3.xnumpypython-xarraynumpy-ufunc

How to apply a Pearson Correlation Analysis over all pairs of pixels of a DataArray as a Correlation Matrix?


I am facing serious difficulties in generating a correlation matrix (pixel by pixel) of a single Netcdf with dimensions ('lon', 'lat', 'time'). My final intent is to generate what one calls a Teleconnectivity Map.

This Map is composed of correlation coefficients. Each pixel has a value that represents the highest correlation value (in module) found in the correlation matrix over all pairs of pixels in the DataArray.

Therefore, in order to create my Teleconnectivity Map, instead of looping over every longitude ('lon') and every latitude ('lat') and later checking all possible combinations of correlation for which one was higher in magnitude, I was thinking of applying the xr.apply_ufunction with a wrapped correlation function inside.

Despite my efforts, I still don't get what is truly happening behind the scenes in the xr.apply_ufunc. All I managed to do was as a single Resultant Matrix with all pixels equals to 1 (perfect correlation).

See code below:


import numpy as np

import xarray as xr

def correlation(x, y):

    return np.corrcoef(x, y)[0,0] # to return a single correlation index, instead of a matriz


def wrapped_correlation(da, x, coord='time'):
    """Finds the correlation along a given dimension of a dataarray."""


    from functools import partial

    fpartial = partial(correlation, x.values)

    return xr.apply_ufunc(fpartial, 
                          da, 
                          input_core_dims=[[coord]] , 
                          output_core_dims=[[]],
                          vectorize=True,
                          output_dtypes=[float]
                          )



# testing the wrapped correlation for a sample data:

ds = xr.tutorial.open_dataset('air_temperature').load()

# testing for a single point in space.

x = ds['air'].sel(dict(lon=1, lat=92), method='nearest')

# over all points in the DataArray

Corr_over_x = wrapped_correlation(ds['air'], x)

Corr_over_x# notice that the resultant DataArray is composed solely of ones (perfect correlation match). This is impossible. I would expect to have different values of correlation for each pixel in here

# if one would plot the data, I would be composed of a variety of correlation values (see example below):

Corr_over_x.plot()

This is an important asset for meteorologists and Remote Sensing researches. It allows the evaluation of potential geophysical patterns over a given area of study.

I thank you for your time, and I hope hearing from you soon.

Sincerely yours,


Solution

  • I have managed to solve my question. The script has become a bit long. Nevertheless, it does what it was previously intended.

    The code is adapted from this reference

    Since it is too long to show a snippet in here, I am posting a link to my Github account in which the algorithm (organized in a package named Teleconnection_using_xarray_data) can be checked here.

    The package has two modules with similar results.

    The first module (teleconnection_with_connecting_pathways) is slower than the second (teleconnection_via_numpy), but it allows to evaluate the connecting pathways between the partial teleconnection maps.

    The second, only returns the resultant teleconnection map, without the connecting lines (geopandas-Linestrings), though it is much faster.

    Feel free to colaborate. If possible, I would like to combine both modules ensuring speed and pathway analyses in the Teleconnection algorithm.

    Sincerely yours,

    Philipe Leal