Search code examples
rrasterrgdalgrib

Metadata is lost when reading a GRIB2 file


I'd like to read in a GRIB2 file to R, but have been unable to install wgrib2 (after several hours of struggling) meaning that rNOMADS is not an option. That's okay, as GRIB2 files can be read by both the raster and rgdal packages. The problem I run into is that the names of the layers are stripped when reading in the file.

Here's an example.

# Load libraries
library(raster)
library(rgdal)

# Name of file
file_name <- "https://dd.weather.gc.ca/model_gem_regional/coupled/gulf_st-lawrence/grib2/00/001/CMC_coupled-rdps-stlawrence-ocean_latlon0.02x0.03_2020120100_P001.grib2"

# Load as raster brick
b <- brick(file_name)

# Get layer names
names(b)
# [1] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.1" 
# [2] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.2" 
# [3] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.3" 
# [4] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.4" 
# [5] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.5" 
# [6] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.6" 
# [7] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.7" 
# [8] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.8" 
# [9] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.9" 
#[10] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.10"

As you can see, the name are just generic defaults. Next, I tried .

# Load using rgdal
r <- readGDAL(file_name)

# Get names
names(r)

# [1] "band1"  "band2"  "band3"  "band4"  "band5"  "band6"  "band7"  "band8" 
# [9] "band9"  "band10"

Once again, default names. But, if I use the command line utility ncl_convert2nc to convert the GRIB2 file to NetCDF and then read in the NetCDF file with ncdf4 – an additional conversion step that I don't want to include in my workflow if it can be avoided – there are definitely variable names present.

# [1] "UOGRD_P0_L160_GLL0" "VOGRD_P0_L160_GLL0" "ICEC_P0_L1_GLL0"   
# [4] "ICETK_P0_L1_GLL0"   "UICE_P0_L1_GLL0"    "VICE_P0_L1_GLL0"   
# [7] "ICETMP_P0_L1_GLL0"  "ICEPRS_P0_L1_GLL0"  "CICES_P0_L1_GLL0"  
#[10] "WTMP_P0_L1_GLL0"   

QUESTION: Is there a way to extract or retain the variable/layer names when using rgdal or raster to read a GRIB2 file?


PS The reason I need to get the variable names from the file is because the layers don't match up with the order of layers as specified on the website when loaded with (e.g.) raster. This is apparent from the variable values. While I could use the variable names gleaned from the NetCDF file shown above, if the order of the layers changed this would break my package.


Solution

  • You can use the terra package instead of raster.

    file_name <- "https://dd.weather.gc.ca/model_gem_regional/coupled/gulf_st-lawrence/grib2/00/001/CMC_coupled-rdps-stlawrence-ocean_latlon0.02x0.03_2020120100_P001.grib2"
    b <- basename(file_name)
    if (!file.exists(b)) download.file(file_name, b, mode="wb")
    library(terra)
    r <- rast(b)
    r
    #class       : SpatRaster 
    #dimensions  : 325, 500, 10  (nrow, ncol, nlyr)
    #resolution  : 0.03, 0.02  (x, y)
    #extent      : -71.015, -56.015, 45.49, 51.99  (xmin, xmax, ymin, ymax)
    #coord. ref. : +proj=longlat +R=6371229 +no_defs 
    #source      : CMC_coupled-rdps-stlawrence-ocean_latlon0.02x0.03_2020120100_P001.grib2 
    #names       : 0[-] SFC="Ground or water surface", 0[-] SFC="Ground or water surface", 0[-] SFC="Ground or water surface", 0[-] SFC="Ground or water surface", 0[m] DBSL="Depth below sea level", 0[m] DBSL="Depth below sea level", ... 
    

    But the variable names do not match yours

    names(r)
    # [1] "0[-] SFC=\"Ground or water surface\"" "0[-] SFC=\"Ground or water surface\"" "0[-] SFC=\"Ground or water surface\""
    # [4] "0[-] SFC=\"Ground or water surface\"" "0[m] DBSL=\"Depth below sea level\""  "0[m] DBSL=\"Depth below sea level\"" 
    # [7] "0[-] SFC=\"Ground or water surface\"" "0[-] SFC=\"Ground or water surface\"" "0[-] SFC=\"Ground or water surface\""
    #[10] "0[-] SFC=\"Ground or water surface\""
    

    You can set the names to other pieces of the metadata

     nms <- trimws(grep("GRIB_ELEMENT=", desc(b), value=TRUE))
     names(r) <- gsub("GRIB_ELEMENT=", "", nms)
    
    r
    #class       : SpatRaster 
    #dimensions  : 325, 500, 10  (nrow, ncol, nlyr)
    #resolution  : 0.03, 0.02  (x, y)
    #extent      : -71.015, -56.015, 45.49, 51.99  (xmin, xmax, ymin, ymax)
    #coord. ref. : +proj=longlat +R=6371229 +no_defs 
    #source      : CMC_coupled-rdps-stlawrence-ocean_latlon0.02x0.03_2020120100_P001.grib2 
    #names       : ICEC, ICETK, UICE, VICE, UOGRD, VOGRD, ..
    
    names(r)
    #[1] "ICEC"   "ICETK"  "UICE"   "VICE"   "UOGRD"  "VOGRD"  "WTMP"   "ICET"   "ICEPRS" "CICES" 
    

    I can change the behavior of terra such that it uses "GRIB_ELEMENT" (please let me know if that makes sense). But it is not clear to me how to get to the names you show. For example, below is the GDAL metadat for the first layer. You show ICEC_P0_L1_GLL0. All names have P0 and GLL0 so at least for this file, these seem redundant. But what does L1 refer to?

    d <-desc(b) 
    d[35:46]
    # [1] "    GRIB_COMMENT=Ice cover [Proportion]"                                                                                                                                                
    # [2] "    GRIB_DISCIPLINE=10(Oceanographic_Products)"                                                                                                                                         
    # [3] "    GRIB_ELEMENT=ICEC"                                                                                                                                                                  
    # [4] "    GRIB_FORECAST_SECONDS=3600 sec"                                                                                                                                                     
    # [5] "    GRIB_IDS=CENTER=54(Montreal) SUBCENTER=0 MASTER_TABLE=4 LOCAL_TABLE=0 SIGNF_REF_TIME=1(Start_of_Forecast) REF_TIME=2020-12-01T00:00:00Z PROD_STATUS=0(Operational) TYPE=1(Forecast)"
    # [6] "    GRIB_PDS_PDTN=0"                                                                                                                                                                    
    # [7] "    GRIB_PDS_TEMPLATE_ASSEMBLED_VALUES=2 0 2 56 56 0 0 1 1 1 0 0 255 -127 -2147483647"                                                                                                  
    # [8] "    GRIB_PDS_TEMPLATE_NUMBERS=2 0 2 56 56 0 0 0 1 0 0 0 1 1 0 0 0 0 0 255 255 255 255 255 255"                                                                                          
    # [9] "    GRIB_REF_TIME=  1606780800 sec UTC"                                                                                                                                                 
    #[10] "    GRIB_SHORT_NAME=0-SFC"                                                                                                                                                              
    #[11] "    GRIB_UNIT=[Proportion]"                                                                                                                                                             
    #[12] "    GRIB_VALID_TIME=  1606784400 sec UTC"    
    

    I see that "UOGRD" and "VOGRD" have L160 and have the number 160 in "GRIB_PDS_TEMPLATE_ASSEMBLED_VALUE" and "GRIB_PDS_TEMPLATE_NUMBERS" where the others have 1.

    The metadata structure is described here but I could use some guidance about where to look to understand what to extract from the metadata.