Search code examples
copynetcdfclippingnetcdf4terra

Clipping netCDF file to a shapefile and cloning the metadata variables in R


I have NetCDF files (e.g https://data.ceda.ac.uk/neodc/esacci/lakes/data/lake_products/L3S/v1.0/2019 global domain), and I want to extract the data based on a shapefile boundary ( in this case a Lake here - https://www.sciencebase.gov/catalog/item/530f8a0ee4b0e7e46bd300dd) and then save clipped data as a NetCDF file but retain all the original metadata and variables names within the clipped file. This is what I have done far

library(rgdal)
library(sf)
library(ncdf4)
library(terra)

#Read in the shapefile of Lake 
Lake_shape <- readOGR("C:/Users/CEDA/hydro_p_LakeA/hydro_p_A.shp")
# Reading the netcdf file using Terra Package function rast
test <- rast("ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20190705-fv1.0.nc")
# List of some of variables names for orginal dataset 
      head(names(test))
[1] "water_surface_height_above_reference_datum" "water_surface_height_uncertainty"           "lake_surface_water_extent"                 
[4] "lake_surface_water_extent_uncertainty"      "lake_surface_water_temperature"             "lswt_uncertainty"   
                                 
#Clipping data to smaller Lake domain using the crop function in Terra Package
test3 <- crop(test, Lake_shape)
#Listing the some variables names for clipped data
head(names(test3))
[1] "water_surface_height_above_reference_datum" "water_surface_height_uncertainty"           "lake_surface_water_extent"                 
[4] "lake_surface_water_extent_uncertainty"      "lake_surface_water_temperature"             "lswt_uncertainty" 


# Writing the crop dataset as netcdf or Raster Layer using the WriteCDF function 

filepath<-"Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0"
fname <- paste0( "C:/Users/CEDA/",filepath,".nc")
rnc <- writeCDF(test3, filename =fname, overwrite=T)”

My main issue here when I read in clipped the netCDF file I don’t seem to be able to keep the names of the data variables of the original NetCDF. They are all being renamed automatically when I am saving the clipped dataset as a new netCDF using the writeCDF function.

#Reading in the new clipped file
 LakeA<-rast("Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0.nc")
> head(names(LakeA))
[1] "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_1" "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_2"
[3] "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_3" "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_4"
[5] "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_5" "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_6"

So is it possible to clone/copy all the metadata variables from the original NetCDF dataset when clipping to the smaller domain/shapefile in R, then saving as NetCDF? Any guidance on how to do this in R would be really appreciated. (NetCDF and R are all new to me so I am not sure what I am missing or have the in-depth knowledge to sort this).


Solution

  • You have a NetCDF file with many (52) variables (sub-datasets). When you open the file with rast these become "layers". Alternatively you can open the file with sds to keep the sub-dataset structure but that does not help you here (and you would need to skip the first two, see below).

    library(terra)
    f <- "ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20190101-fv1.0.nc"
    r <- rast(f)
    r
    #class       : SpatRaster 
    #dimensions  : 21600, 43200, 52  (nrow, ncol, nlyr)
    #resolution  : 0.008333333, 0.008333333  (x, y)
    #extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
    #coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
    #sources     : ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20190101-fv1.0.nc:water_surface_height_above_reference_datum  
                  ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20190101-fv1.0.nc:water_surface_height_uncertainty  
                  ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20190101-fv1.0.nc:lake_surface_water_extent  
                  ... and 49 more source(s)
    #varnames    : water_surface_height_above_reference_datum (water surface height above geoid) 
                  water_surface_height_uncertainty (water surface height uncertainty) 
                  lake_surface_water_extent (Lake Water Extent) 
                  ...
    #names       : water~datum, water~ainty, lake_~xtent, lake_~ainty, lake_~ature, lswt_~ainty, ... 
    #unit        :           m,           m,         km2,         km2,      Kelvin,      Kelvin, ... 
    #time        : 2019-01-01 
    

    Note that there are 52 layers and sources (sub-datasets). There are names

    head(names(r))
    #[1] "water_surface_height_above_reference_datum" "water_surface_height_uncertainty"          
    #[3] "lake_surface_water_extent"                  "lake_surface_water_extent_uncertainty"     
    #[5] "lake_surface_water_temperature"             "lswt_uncertainty"                          
    

    And also "longnames" (they are often much longer than the variable names, not in this case)

    head(longnames(r))
    # [1] "water surface height above geoid" "water surface height uncertainty" "Lake Water Extent"               
    # [4] "Water extent uncertainty"         "lake surface skin temperature"    "Total uncertainty"               
    

    You can also open the file with sds, but you need to skip "lon_bounds" and "lat_bounds" variables (dimensions)

    s <- sds(f, 3:52)
    

    Now read a vector data set (shapefile in this case) and crop

    lake <- vect("hydro_p_LakeErie.shp")
    rc <- crop(r, lake)
    rc 
    
    #class       : SpatRaster 
    #dimensions  : 182, 555, 52  (nrow, ncol, nlyr)
    #resolution  : 0.008333333, 0.008333333  (x, y)
    #extent      : -83.475, -78.85, 41.38333, 42.9  (xmin, xmax, ymin, ymax)
    #coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
    #source      : memory 
    #names       : water~datum, water~ainty, lake_~xtent, lake_~ainty, lake_~ature, lswt_~ainty, ... 
    #min values  :         NaN,         NaN,         NaN,         NaN,     271.170,       0.283, ... 
    #max values  :         NaN,         NaN,         NaN,         NaN,     277.090,       0.622, ... 
    #time        : 2019-01-01 
     
    

    It can be convenient to save this to a GTiff file like this (or even better to use the filename argument in crop)

    gtf <- writeRaster(rc, "test.tif", overwrite=TRUE)
    gtf
    #class       : SpatRaster 
    #dimensions  : 182, 555, 52  (nrow, ncol, nlyr)
    #resolution  : 0.008333333, 0.008333333  (x, y)
    #extent      : -83.475, -78.85, 41.38333, 42.9  (xmin, xmax, ymin, ymax)
    #coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
    #source      : test.tif 
    #names       : water~datum, water~ainty, lake_~xtent, lake_~ainty, lake_~ature, lswt_~ainty, ... 
    #min values  :         NaN,         NaN,         NaN,         NaN,     271.170,       0.283, ... 
    #max values  :         NaN,         NaN,         NaN,         NaN,     277.090,       0.622, ... 
    

    What has changed is that the data are now in a file, rather then in memory. And you still have the layer (variable) names.

    To write the layers as variables to a NetCDF file you need to create a SpatRasterDataset. You can do that like this:

    x <- as.list(rc)
    s <- sds(x)
    names(s) <- names(rc)
    longnames(s) <- longnames(r)
    units(s) <- units(r)
    

    Note the use of longnames(r) and units(r) (not rc). This is because r has subdatasets (and each has a longname and a unit) while rc does not.

    Now use writeCDF

    z <- writeCDF(s, "test.nc", overwrite=TRUE)
     
    rc2 <- rast("test.nc")
    rc2
    
    #class       : SpatRaster 
    #dimensions  : 182, 555, 52  (nrow, ncol, nlyr)
    #resolution  : 0.008333333, 0.008333333  (x, y)
    #extent      : -83.475, -78.85, 41.38333, 42.9  (xmin, xmax, ymin, ymax)
    #coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
    #sources     : test.nc:water_surface_height_above_reference_datum  
                  test.nc:water_surface_height_uncertainty  
                  test.nc:lake_surface_water_extent  
                  ... and 49 more source(s)
    #varnames    : water_surface_height_above_reference_datum (water surface height above geoid) 
                  water_surface_height_uncertainty (water surface height uncertainty) 
                  lake_surface_water_extent (Lake Water Extent) 
                  ...
    #names       : water~datum, water~ainty, lake_~xtent, lake_~ainty, lake_~ature, lswt_~ainty, ... 
    #unit        :           m,           m,         km2,         km2,      Kelvin,      Kelvin, ... 
    #time        : 2019-01-01 
    

    So it looks like we have a NetCDF with the same structure.

    Note that the current CRAN version of terra drops the time variable if there is only one time step. The development version (1.3-11) keeps the time dimension, even of there is only one step.

    You can install the development version with install.packages('terra', repos='https://rspatial.r-universe.dev')