Search code examples
python-3.xcorrelationnetcdfmatplotlib-basemapnetcdf4

Calculate netcdf files' variables correlation and plot on the map with Python


Given two nc files here:

data/CMIP6_UKESM1-0-LL_Lmon_piControl_r1i1p1f2_gpp_1960-3059.nc
data/CMIP6_UKESM1-0-LL_Amon_piControl_r1i1p1f2_tas_1960-3059.nc

Read the first file:

from netCDF4 import Dataset
import numpy as np

ds1 = Dataset('data/CMIP6_UKESM1-0-LL_Lmon_piControl_r1i1p1f2_gpp_1960-3059.nc')
print(ds1.variables.keys()) # get all variable names

Out:

odict_keys(['gpp', 'time', 'time_bnds', 'lat', 'lat_bnds', 'lon', 'lon_bnds', 'clim_season', 'season_year'])

Read the second file:

ds2 = Dataset('data/CMIP6_UKESM1-0-LL_Amon_piControl_r1i1p1f2_tas_1960-3059.nc')
print(ds2.variables.keys()) 

Out:

odict_keys(['tas', 'time', 'time_bnds', 'lat', 'lat_bnds', 'lon', 'lon_bnds', 'clim_season', 'height', 'season_year'])

Check gpp variable:

gpp = ds1.variables['gpp'] # gpp variable
print(gpp)

Out:

<class 'netCDF4._netCDF4.Variable'>
float32 gpp(time, lat, lon)
    _FillValue: 1e+20
    standard_name: gross_primary_productivity_of_biomass_expressed_as_carbon
    long_name: Carbon Mass Flux out of Atmosphere Due to Gross Primary Production on Land [kgC m-2 s-1]
    units: kg m-2 s-1
    cell_methods: area: mean where land time: mean
    coordinates: clim_season season_year
unlimited dimensions: 
current shape = (3300, 144, 192)
filling on

Check tas variable:

tas = ds2.variables['tas'] # tas variable
print(tas)

Out:

<class 'netCDF4._netCDF4.Variable'>
float32 tas(time, lat, lon)
    _FillValue: 1e+20
    standard_name: air_temperature
    long_name: Near-Surface Air Temperature
    units: K
    cell_methods: area: time: mean
    coordinates: clim_season height season_year
unlimited dimensions: 
current shape = (3300, 144, 192)
filling on

Now I would like to calculate correlation between gpp and tas, then plot their correlation values on the map.

How could I do that? Thanks.


Solution

  • You should be able to do this easily using my package nctoolkit (https://nctoolkit.readthedocs.io/en/latest/).

    My understanding is that you want to plot the temporal correlation coefficient per grid-cell. In that case:

    import nctoolkit as nc
    files = ["data/CMIP6_UKESM1-0-LL_Lmon_piControl_r1i1p1f2_gpp_1960-3059.nc",
    "data/CMIP6_UKESM1-0-LL_Amon_piControl_r1i1p1f2_tas_1960-3059.nc"]
    ds = nc.open_data(files)
    ds.merge()
    ds.cor_time(var1 = "gpp", var2 =  "tas")
    ds.plot()
    

    If you wanted the spatial correlation coefficient between variables per time step, you can use the following line in place of the second last one:

    ds.cor_space(var1 = "gpp", var2 =  "tas")