Search code examples
pythonmasknetcdfshapefile

mask NetCDF using shapefile and calculate average and anomaly for all polygons within the shapefile


There are several tutorials (example 1, example 2, example 3) about masking NetCDF using shapefile and calculating average measures. However, I was confused with those workflows about masking NetCDF and extracting measures such as average, and those tutorials did not include extract anomaly (for example, the difference between temperature in 2019 and a baseline average temperature).

I make an example here. I have downloaded monthly temperature (download temperature file) from 2000 to 2019 and the state-level US shapefile (download shapefile). I want to get the state-level average temperature based on the monthly average temperature from 2000 to 2019 and the temperature anomaly of year 2019 relative to baseline temperature from 2000 to 2010. Specifically, the final dataframe looks as follow:

state avg_temp anom_temp2019
AL xx xx
AR xx xx
... ... ...
WY xx xx
# Load libraries
%matplotlib inline

import regionmask
import numpy as np
import xarray as xr
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

# Read shapefile
us = gpd.read_file('./shp/state_cus.shp')

# Read gridded data
ds = xr.open_mfdataset('./temp/monthly_mean_t2m_*.nc')
......

I really appreciate your help that providing an explicit workflow that could do the above task. Thanks a lot.


Solution

  • This can be achieved using regionmask. I don't use your files but the xarray tutorial data and naturalearth data for the US states.

    import numpy as np
    import regionmask
    import xarray as xr
    
    # load polygons of US states
    us_states_50 = regionmask.defined_regions.natural_earth.us_states_50
    
    # load an example dataset
    air = xr.tutorial.load_dataset("air_temperature")
    
    # turn into monthly time resolution
    air = air.resample(time="M").mean()
    
    # create a mask
    mask3D = us_states_50.mask_3D(air)
    
    # latitude weights
    wgt = np.cos(np.deg2rad(air.lat))
    
    # calculate regional averages
    reg_ave = air.weighted(mask3D * wgt).mean(("lat", "lon"))
    
    # calculate the average temperature (over 2013-2014)
    avg_temp = reg_ave.sel(time=slice("2013", "2014")).mean("time")
    
    # calculate the anomaly (w.r.t. 2013-2014)
    reg_ave_anom = reg_ave - avg_temp
    
    # select a single timestep (January 2013)
    reg_ave_anom_ts = reg_ave_anom.sel(time="2013-01")
    
    # remove the time dimension
    reg_ave_anom_ts = reg_ave_anom_ts.squeeze(drop=True)
    
    # convert to a pandas dataframe so it's in tabular form
    df = reg_ave_anom_ts.air.to_dataframe()
    
    # set the state codes as index
    df = df.set_index("abbrevs")
    
    # remove other columns
    df = df.drop(columns="names")
    

    You can find info how to use your own shapefile on the regionmask docs (Working with geopandas).

    disclaimer: I am the main author of regionmask.