Search code examples
pythonarraysnumpycorrelationnetcdf

Spatial Correlation calculation in python: Concatenation Error-Mismatched array sizes along dimension 0


I am attempting to plot the spatial correlation between two variables, SIC and spco2, which are stored in separate NetCDF files. However, the spco2 dataset has a different coordinate type than the ‘SIC’ dataset. Below is some brief information about both datasets:

Sea Ice data

Dimensions:

  • LON: 360
  • LAT: 173
  • TIME: 444
  • bnds: 2

Coordinates:

  • LON (float64): -179.5, -178.5, ..., 178.5, 179.5
  • LAT (float64): -82.5, -81.5, -80.5, ..., 88.5, 89.5
  • TIME (datetime64[ns]): 1985-01-01, ..., 2021-12-01

Data variables:

  • TIME_bnds (TIME, bnds): datetime64[ns]
  • SIC (TIME, LAT, LON): float32

Carbon Data:

Dimensions:

  • time: 444
  • latitude: 173
  • longitude: 360

Coordinates:

  • time (datetime64[ns]): 1985-01-15, ..., 2021-12-15
  • latitude (float32): -82.5, -81.5, -80.5, ..., 88.5, 89.5
  • longitude (float32): 0.5, 1.5, 2.5, ..., 357.5, 358.5, 359.5

Data variables:

  • spco2 (time, latitude, longitude): float32
  • tco2 (time, latitude, longitude): float32
  • fgco2 (time, latitude, longitude): float32
  • ph (time, latitude, longitude): float32
  • talk (time, latitude, longitude): float32

In the following the code I transformed the coordinates of carbon data from 0:360 to -180:180 so that the format matches but I am not sure if worked:

import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from scipy.interpolate import griddata

# Load sea ice data
sea_ice_data = xr.open_dataset(path_ice)
sea_ice_sic = sea_ice_data['SIC']  

# Load carbon data
carbon_data = xr.open_dataset(path_carbon)
carbon_spco2 = carbon_data['spco2']  

# Convert carbon longitudes to -180 to +180 range
carbon_spco2['longitude'] = (carbon_spco2['longitude'] + 180) % 360 - 180

# Flatten latitude and longitude coordinates
carbon_coords = np.column_stack((carbon_spco2['latitude'].values.flatten(), carbon_spco2['longitude'].values.flatten()))

# Calculate spatial correlation map
correlation_map = np.empty((len(sea_ice_sic['LAT']), len(sea_ice_sic['LON'])))
for lat_idx, lat in enumerate(sea_ice_sic['LAT']):
    for lon_idx, lon in enumerate(sea_ice_sic['LON']):
        sic_values = sea_ice_sic.sel(LAT=lat, LON=lon, method='nearest').values

        # Interpolate carbon data to sea ice grid
        spco2_values = griddata(
            carbon_coords,
            carbon_spco2.values.flatten(),
            (lat, lon),
            method='nearest'
        )
        
        correlation_map[lat_idx, lon_idx] = np.corrcoef(sic_values, spco2_values)[0, 1]

# Create a Cartopy projection
projection = ccrs.PlateCarree()

# Plot the spatial correlation map using Cartopy
plt.figure(figsize=(12, 8))
ax = plt.axes(projection=projection)
ax.set_extent([-180, 180, -90, 90], crs=ccrs.PlateCarree())
ax.coastlines()

# Plot the correlation map as an image
plt.imshow(correlation_map, cmap='RdBu_r', vmin=-1, vmax=1, extent=(-180, 180, -90, 90), origin='upper', transform=ccrs.PlateCarree())

# Add a colorbar
cbar = plt.colorbar(label='Correlation Coefficient', orientation='vertical', shrink=0.7)
cbar.ax.tick_params(labelsize=10)

plt.title('Spatial Correlation between Sea Ice and Carbon')
plt.show()

However, I am ending up with the following error message:

ValueError                                Traceback (most recent call last)
<ipython-input-8-1d668216ff21> in <cell line: 20>()
     18 
     19 # Flatten latitude and longitude coordinates
---> 20 carbon_coords = np.column_stack((carbon_spco2['latitude'].values.flatten(), carbon_spco2['longitude'].values.flatten()))
     21 
     22 # Calculate spatial correlation map

2 frames
/usr/local/lib/python3.10/dist-packages/numpy/core/overrides.py in column_stack(*args, **kwargs)

/usr/local/lib/python3.10/dist-packages/numpy/lib/shape_base.py in column_stack(tup)
    654             arr = array(arr, copy=False, subok=True, ndmin=2).T
    655         arrays.append(arr)
--> 656     return _nx.concatenate(arrays, 1)
    657 
    658 

/usr/local/lib/python3.10/dist-packages/numpy/core/overrides.py in concatenate(*args, **kwargs)

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 173 and the array at index 1 has size 360

I am assuming the regridding did not work properly. I would appreciate any help or suggestions, regarding this. I am also attaching the Google Drive links to my dataset below:

  1. Sea ice
  2. Carbon

Thank you so much for considering to take a look at my issue.


Solution

  • I can propose solution like this, although the loops make it very slow and at the moment do not check that time-moments are really the same (I just assume that time values are equal i.e. values correspond correctly in time):

    #!/usr/bin/env ipython
    # --------------------
    import xarray as xr
    import numpy as np
    # --------------------
    filein_a = 'extracted_sea_ice.nc'
    filein_b = 'Carbon-rep-monthly_1985-2021.nc'
    # ------------------------------------------
    def nc_varget(fin,vin):
        with xr.open_dataset(fin) as ncin:
            return ncin.variables[vin].values
    # ------------------------------------------
    xa = nc_varget(filein_a,'LON') 
    ya = nc_varget(filein_a,'LAT')
    # -----------------------------------------------------------
    # ===========================================================
    corrmat = np.zeros((np.size(ya),np.size(xa)))
    # -------------------------------------------
    dfin_a = xr.open_dataset(filein_a)
    dfin_b = xr.open_dataset(filein_b)
    
    for ix,xval in enumerate(xa):
        for iy,yval in enumerate(ya):
            corrmat[iy,ix] = 0.e0
            serie_a = dfin_a.sel(LON=xval,LAT=yval,method='nearest')
            serie_b = dfin_b.sel(longitude=xval,latitude=yval,method='nearest')
            # ---------------------------------------------------
            serie_a = serie_a['SIC']
            serie_b = serie_b['spco2']
            # ---------------------------------------------------
            if np.nanstd(serie_a) == 0.e0 or np.isnan(np.nanstd(serie_a))==1: continue
            if np.nanstd(serie_b) == 0.e0 or np.isnan(np.nanstd(serie_b))==1: continue
            # ---------------------------------------------------
            corrmat[iy,ix] = np.nanmin(np.corrcoef(serie_a,serie_b))
    # ===================================================================================
    ds = xr.Dataset(data_vars=dict(corrmat=(["lat", "lon"], corrmat)),coords=dict(lon=(["lon"], xa),lat=(["lat"], ya)),attrs=dict(description="Spatial correlation"))
    ds.to_netcdf('test_output.nc')