Search code examples
rnetcdf4

How to manipulate NetCDF-4 groups in R?


My goal is to use the NetCDF-4 standard to create large files containing diverse data and metadata. I haven't found comprehensive documentation on this topic, so I have written this short guide to share my findings.

I want to create and manipulate groups in a NetCDF-4 file. My goal is to define and read dimensions, attributes, and variables from the file.


Solution

  • To awnser your question, here is a little guide to help you !

    In this guide we are going to use

    library(RNetCDF)
    

    The RNetCDF library is very useful for creating, organizing and reading NetCDF-4 files in groups.

    OFFICIAL DOCS = https://cran.r-project.org/web/packages/RNetCDF/RNetCDF.pdf

    Firstly, you'll need a path to store/read netCDF4 file. We'll name it NC_FILE_PATH

    Create groups with variable/attributes/dimension

    # ==== Fake data ====
    time_values <- c(1,2,3,4,5,6,7,8,9)
    pr_values <- c(11,12,13,14,15,16,17,18,19)
    lat <- 14.65
    lat <- 17.88
    
    # Creation of the NetCDF file with NetCDF-4 format
    nc_out <- create.nc(NC_FILE_PATH, format = "netcdf4")
    # Close temporarily to avoid conflicts
    close.nc(nc_out)
    
    # Opening the NetCDF file
    nc_out <- open.nc(NC_FILE_PATH, write = TRUE)
    
    # We are going to names our groups by the index
    for(index in 1:10){
        group <- grp.def.nc(nc_out, paste0(index))
    
        # Define "time" dimension
        dim.def.nc(group, "time", length(time_values))
        
        # Define variables inside of the group with "time" dimension
        var.def.nc(group,"time", "NC_DOUBLE", "time")
        var.def.nc(group,"pr_mm", "NC_DOUBLE", "time")
        
        # Add an attribute "units" to the variable "time"
        att.put.nc(group, "time", "units", "NC_CHAR", "days since 1970-01-01")
    
        # Add group global attributes latitude and longitude
        att.put.nc(group,"NC_GLOBAL","latitude","NC_DOUBLE",lat)
        att.put.nc(group,"NC_GLOBAL","longitude","NC_DOUBLE",lon)
    
        # Add values to variables
        var.put.nc(groupe,"pr_mm", pr_values)
        var.put.nc(groupe,"time", time_values)
    }
    
    # Close the file NetCDF file 
    close.nc(nc_out)
    

    Read groups infos with variable/attributes/dimension

    The RNetCDF library uses pointers: each group is represented by a pointer that we can use in other functions. This pointer can be found in the self attribute of group object. This means that when you need to retrieve an object or information from a group, you always need to use $self before reading the data contained in a group.

    In this exemple we're gonna use the file created previously.

    # Opening the file and reading his data
    nc <- open.nc(NC_FILE_PATH)
    
    # Extract time and pr_mm values from group with name 'index_1' 
    groupe <- grp.inq.nc(nc,"index_1")$self
    
    time_values <- read.nc(groupe)$time
    pr_values <- read.nc(groupe)$pr_mm
    
    # Extract attributes from group with name 'index_1'
    groupe <- grp.inq.nc(nc,"index_1")$self
    
      # Get group global attributes
      lat <- att.get.nc(groupe,"NC_GLOBAL","latitude")
      lon <- att.get.nc(groupe,"NC_GLOBAL","longitude")
    
      # Get time attribute
      time_units <- att.get.nc(groupe, "time", "units")
    
    # Extract dimension from group with name 'index_1' 
    groupe <- grp.inq.nc(nc,"index_1")$self
    
    dim <- dim.inq.nc(groupe, "time")
    
    lenght_dim <- dim$length # Lenght of dim 
    name_dim <- dim$name     # Name of dim 
    
    # Close the file NetCDF file
    close.nc(nc)
    

    How to browse all groups in a netCDF file

    The goal of these code is to get the variable groupe to contains a NetCDF group pointer. With this pointer, you can get every information you need about your group.

    By using the group list

    Firstly, we want to browse into all of our groups to get the groupe pointer and then (for the exemple) display their name (using for loop). To browse inside, we are going to use the list all groups given by grp.inq.nc(nc) and use an index. In summary, given a file in NetCDF, we can use the for loop to extract a characteristic of my group (in this case, the name) from its place in the list of all groups.

    # Opening the NetCDF file
    nc <- open.nc(NC_FILE_PATH)
    
    # Print all subgroup name from index group
    nc_global <- grp.inq.nc(nc)
    for (grp_index in 1:length(nc_global$grps)){
      groupe <- nc_global$grps[[grp_index]] # 'groupe' is define by getting the grp_index group from the list of all group
      print(grp.inq.nc(groupe)$name) # display the group name with the grp.inq.nc()
    }
    

    If you want to apply the for loop for all groups and subgroups in NetCDF files, you can use a recursive function.

    This recursive function displays all groups and subgroups names.

    list_groups_recursively <- function(grp) {
      # Display group name
      print(grp.inq.nc(grp)$name)
      
      # If group have subgroups then use function to display all subgroups names
      sub_groups <- grp.inq.nc(grp)$grps
      if (length(sub_groups) > 0) {
        for (sub_group in sub_groups) {
          list_groups_recursively(sub_group)
        }
      }
    }
    # Opening the NetCDF file 
    nc <- open.nc(NC_FILE_PATH)
    
    # Inquire about NetCDF nc 
    nc_global <- grp.inq.nc(nc)
    
    # application of the function
    list_groups_recursively(nc_global)
    
    # Close the file NetCDF file
    close.nc(nc)
    

    By name

    Now, we know our group name and we want to get the groupe pointer. To do so, we are going to use this line grp.inq.nc(nc,name)$self to get the pointer to group infos.

    # Opening the NetCDF file 
    nc <- open.nc(NC_FILE_PATH)
    
    # Reading the NetCDF file 
    nc_files <- read.nc(nc, recursive=TRUE)
    
    # Names of groupe from nc_files (list)
    names <- names(nc_files)
    
    # For loop that retrieves the NetCDF file group pointer from its name once it has been read by the read.nc() function and transformed into a list of variables
    for(name in names){
      groupe <- grp.inq.nc(nc,name)$self
      print(groupe)
    }
    
    # Close the file NetCDF file
    close.nc(nc)