Search code examples
rnetcdfterra

Matching data structure between two netCDF files


I have two netCDF files:

WATCH data:

> watch_Tair:

     2 variables (excluding dimension variables):
        int timestp[tstep]   
            title: time steps
            units: time steps (days) since 2018-01-01 00:00:00
            long_name: time steps (days) since start of month
        float Tair[lon,lat,tstep]   
            title: Tair average
            units: K
            long_name: Average Near surface air temperature 
 at 2 m at time stamp
            actual_max: 314.173614501953
            actual_min: 212.381393432617
            _FillValue: 1.00000002004088e+20

     4 dimensions:
        lon  Size:720 
            title: Longitude
            units: grid box centre degrees_east
            actual_max: 179.75
            actual_min: -179.75
        lat  Size:360 
            title: Latitude
            units: grid box centre degrees_north
            actual_max: 89.75
            actual_min: -89.75
        tstep  Size:31   *** is unlimited *** (no dimvar)
        day  Size:31 
            title: day
            units: integer
            long_name: day of the month

ERA5 data:

> era5_Tair:

     1 variables (excluding dimension variables):
        short t2m[longitude,latitude,time]   
            scale_factor: 0.00169511169020999
            add_offset: 265.715689919741
            _FillValue: -32767
            missing_value: -32767
            units: K
            long_name: 2 metre temperature

     3 dimensions:
        longitude  Size:3600 
            units: degrees_east
            long_name: longitude
        latitude  Size:1801 
            units: degrees_north
            long_name: latitude
        time  Size:744 
            units: hours since 1900-01-01 00:00:00.0
            long_name: time
            calendar: gregorian

The aim is for ERA5 to match WATCH data structure. I do some pre-processing on the ERA5 file to get the same spatial and temporal aggregation as for WATCH data:

#*Daily aggregation*
era5_Tair_daily <- tapp(era5_Tair, "days", mean) 

#`Spatial aggregation`
era5_temp_resampled <- terra::resample(era5_Tair_daily, watch_Tair, method = "bilinear")

Then, I save it back as a netcd file:

writeCDF(era5_temp_resampled, filename = "./era5_temp.nc", varname = "Tair", unit="K", zname="time", overwrite = TRUE)

> era5_temp_resample:

     2 variables (excluding dimension variables):
        float Tair[longitude,latitude,time]   (Contiguous storage)  
            units: K
            _FillValue: -1.17549402418441e+38
            long_name: Average Near surface air temperature at 2 m at time stamp
            grid_mapping: crs
        int crs[]   (Contiguous storage)  
            crs_wkt: GEOGCRS["unknown",    DATUM["World Geodetic System 1984",        ELLIPSOID["WGS 84",6378137,298.257223563,            LENGTHUNIT["metre",1]],        ID["EPSG",6326]],    PRIMEM["Greenwich",0,        ANGLEUNIT["degree",0.0174532925199433],        ID["EPSG",8901]],    CS[ellipsoidal,2],        AXIS["longitude",east,            ORDER[1],            ANGLEUNIT["degree",0.0174532925199433,                ID["EPSG",9122]]],        AXIS["latitude",north,            ORDER[2],            ANGLEUNIT["degree",0.0174532925199433,                ID["EPSG",9122]]]]
            spatial_ref: GEOGCRS["unknown",    DATUM["World Geodetic System 1984",        ELLIPSOID["WGS 84",6378137,298.257223563,            LENGTHUNIT["metre",1]],        ID["EPSG",6326]],    PRIMEM["Greenwich",0,        ANGLEUNIT["degree",0.0174532925199433],        ID["EPSG",8901]],    CS[ellipsoidal,2],        AXIS["longitude",east,            ORDER[1],            ANGLEUNIT["degree",0.0174532925199433,                ID["EPSG",9122]]],        AXIS["latitude",north,            ORDER[2],            ANGLEUNIT["degree",0.0174532925199433,                ID["EPSG",9122]]]]
            proj4: +proj=longlat +datum=WGS84 +no_defs
            geotransform: -80 0.5 0 15 0 -0.5

     3 dimensions:
        longitude  Size:100 
            units: degrees_east
            long_name: longitude
        latitude  Size:70 
            units: degrees_north
            long_name: latitude
        time  Size:31 
            units: days since 1970-1-1
            long_name: time
            calendar: standard

However, in the watch data, the nc file has as variables Tair + timestp and ERA5 (Tair + crs). How can I get the exact same structure?


Solution

  • The short answer is: you can't.

    At least not with the terra package. Insofar as I know, terra will always add a crs variable (a so-called grid mapping variable) because georeferencing data is its bread-and-butter. In this specific case the crs variable is not required by the CF Metadata Conventions (that all climate and weather datasets should reasonably adhere too) because the horizontal coordinates are in longitude and latitude. The question, however, is: why bother? It certainly doesn't impact the Tair variable, even though it now has a (required) attribute grid_mapping: crs.

    terra also has no way of "knowing" that you want variable timestp in your output file and quite possibly no way of doing that at all (since it is not spatial data). If you want to copy this variable over then you should use a low-level netCDF package like RNetCDF or ncdf4 to lift the variable out of your WATCH data and then copy it into the new file.

    In general, your WATCH data has multiple deficiencies from the viewpoint of the CF Metadata Conventions. These conventions were designed exactly to support the kind of things you want to do here: multiple inputs, combine, process, store. You should inform the WATCH data provider that they should make their data compliant with the conventions. It will make your life a lot easier because most software that can process netCDF data assumes compliance with the CF or some other set of conventions.