Search code examples
pythonnetcdf

Problems clipping netCDF file


I am trying to get a proof of concept going whereas I am taking a netCDF file from NOAA that has worldwide precip data for a particular year and "repartition" the data down to US state level. I keep running into a situation where there's no data after performing the clip.

Using a shape file from https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_state_500k.zip which has all states in a single file.

Using the netCDF file from ftp://ftp.cdc.noaa.gov/Datasets/cpc_global_precip/precip.1988.nc.

Here's my code.

#!/usr/bin/python

import xarray
import geopandas
import glob
import os, sys, getopt
from shapely.geometry import mapping

def partition(shapeFile, ncDirectory, outDirectory, stateName):
    # Load the shape file that either has all the states or just one named state
    sf_all = geopandas.read_file(shapeFile)

    # Filter to our state
    sf = sf_all.loc[sf_all['NAME'] == stateName]

    # Get all the nc files in the ncDirectory so we can repartition all the models
    file_list = sorted(glob.glob(f'{ncDirectory}/*.nc'))

    for file in file_list:
        print(f'Working on: {outDirectory}/{stateName}/{os.path.basename(file)}')

        # Open the file and do some basics
        nc_file = xarray.open_dataset(file)
        nc_file.rio.write_crs("epsg:4326", inplace=True)
        nc_file.rio.set_spatial_dims(x_dim="lon", y_dim="lat", inplace=True)

        # Convert lon from 0,360 to -180,180 if needed
        if any(nc_file.coords['lon'] > 180 ):
            nc_file.coords['lon'] = (nc_file.coords['lon'] + 180) % 360 - 180
        
        # Convert lat from 0,360 to -180,180 if needed
        if any(nc_file.coords['lat'] > 180 ):
            nc_file.coords['lat'] = (nc_file.coords['lat'] + 180) % 360 - 180

        # Perform the actual clipping here based on the geometry of the {sf} variable which should just be our state
        clipped_nc = nc_file.rio.clip(sf.geometry.apply(mapping), crs="epsg:4326", drop=True)

        if os.path.exists(f'{outDirectory}/{stateName}/{os.path.basename(file)}'):
            os.removedirs(f'{outDirectory}/{stateName}/{os.path.basename(file)}')
        
        # Save the file to a new netCDF representing the state/model combination
        clipped_nc.to_netcdf(f'{outDirectory}/{stateName}/{os.path.basename(file)}')

def main(argv):
    shapeFile = ''
    stateName = ''
    ncDirectory = ''
    outDirectory = ''

    opt, args = getopt.getopt(argv,"hf:n:o:s:",["shapedir=","ncdirectory=","outputdir=", "statename="])
    for opt, arg in opt:
        if opt =='-h':
            print('repartition.py -f <shapefile> -n <ncfiledirectory> -o <outputdir> -s <statename>')
            quit(0)
        elif opt in ("-f", "--shapefile"):
            shapeFile = arg.rstrip("/")
        elif opt in ("-n", "--ncdirectory"):
            ncDirectory = arg.rstrip("/")
        elif opt in ("-o", "--outputdir"):
            outDirectory = arg.rstrip("/")
        elif opt in ("-s","--statenames"):
            stateName = arg
        
    isExists = os.path.exists(f'{outDirectory}/{stateName}')
    if not isExists:
        os.makedirs(f'{outDirectory}/{stateName}')
    
    partition(shapeFile,ncDirectory,outDirectory, stateName)

if __name__ == "__main__":
    main(sys.argv[1:])

I am not normally a python user, I am actually more on the lines of C/C++/C#, but python is the requirement so running with it. Other requirement, is this is a Windows environment (not my choice). Maybe someday I will convince the powers to be that running Linux containers is not a bad thing and can be done in Azure so we can stay a "Microsoft Shop", but for now, Windows is it.


Solution

  • The problem lies in the conversion of the longitude from the range [0, 360] to [–180, 180]. The longitudes must be sorted in increasing order, but by converting those above 180° to negative values, they are not sorted anymore. Instead, after the conversion, the longitudes go from 0 to 180 and then from –180 to –0. It seems as if clipping does not work in this case.

    Since the US mainland is in negative longitudes, a simple workaround would be to remove everything between longitudes 0 and +180. Then you're left with only negative longitudes, increasing from –180 to 0. This can be done by adding the following line after opening the dataset with xarray:

    nc_file = nc_file.where(nc_file.lon > 180, drop=True)
    

    In this case, instead of the calculation with the modulo operator %, you can simply convert to longitudes in the desired range by:

    nc_file["lon"] = nc_file.lon - 360
    

    And I would suggest removing the conversion for latitudes, because they are always given with values between –90° and +90°.

    So, a minimal working example (MWE) could be:

    import xarray
    import geopandas
    from shapely.geometry import mapping
    
    stateName = "Alaska"
    
    sf_all = geopandas.read_file("cb_2018_us_state_500k/cb_2018_us_state_500k.shp")
    sf = sf_all.loc[sf_all["NAME"] == stateName]
    
    nc_file = xarray.open_dataset("precip.1988.nc")
    
    # Keep only longitudes between 180° and 360°
    nc_file = nc_file.where(nc_file.lon > 180, drop=True)
    # Convert longitudes from 180,360 to -180,0
    nc_file["lon"] = nc_file.lon - 360
    
    nc_file.rio.write_crs("epsg:4326", inplace=True)
    nc_file.rio.set_spatial_dims(x_dim="lon", y_dim="lat", inplace=True)
    clipped_nc = nc_file.rio.clip(sf.geometry.apply(mapping), crs="epsg:4326", drop=True)
    
    # Show the result
    clipped_nc.precip.mean("time").plot()
    import matplotlib.pyplot as plt; plt.show()
    

    Hope this helps.