My goal is to use the NetCDF-4 standard to create large files containing diverse data and metadata. I haven't found comprehensive documentation on this topic, so I have written this short guide to share my findings.
I want to create and manipulate groups in a NetCDF-4 file. My goal is to define and read dimensions, attributes, and variables from the file.
To awnser your question, here is a little guide to help you !
In this guide we are going to use
library(RNetCDF)
The RNetCDF library is very useful for creating, organizing and reading NetCDF-4 files in groups.
OFFICIAL DOCS = https://cran.r-project.org/web/packages/RNetCDF/RNetCDF.pdf
Firstly, you'll need a path to store/read netCDF4 file. We'll name it NC_FILE_PATH
# ==== Fake data ====
time_values <- c(1,2,3,4,5,6,7,8,9)
pr_values <- c(11,12,13,14,15,16,17,18,19)
lat <- 14.65
lat <- 17.88
# Creation of the NetCDF file with NetCDF-4 format
nc_out <- create.nc(NC_FILE_PATH, format = "netcdf4")
# Close temporarily to avoid conflicts
close.nc(nc_out)
# Opening the NetCDF file
nc_out <- open.nc(NC_FILE_PATH, write = TRUE)
# We are going to names our groups by the index
for(index in 1:10){
group <- grp.def.nc(nc_out, paste0(index))
# Define "time" dimension
dim.def.nc(group, "time", length(time_values))
# Define variables inside of the group with "time" dimension
var.def.nc(group,"time", "NC_DOUBLE", "time")
var.def.nc(group,"pr_mm", "NC_DOUBLE", "time")
# Add an attribute "units" to the variable "time"
att.put.nc(group, "time", "units", "NC_CHAR", "days since 1970-01-01")
# Add group global attributes latitude and longitude
att.put.nc(group,"NC_GLOBAL","latitude","NC_DOUBLE",lat)
att.put.nc(group,"NC_GLOBAL","longitude","NC_DOUBLE",lon)
# Add values to variables
var.put.nc(groupe,"pr_mm", pr_values)
var.put.nc(groupe,"time", time_values)
}
# Close the file NetCDF file
close.nc(nc_out)
The RNetCDF library uses pointers: each group is represented by a pointer that we can use in other functions. This pointer can be found in the self
attribute of group
object. This means that when you need to retrieve an object or information from a group, you always need to use $self
before reading the data contained in a group.
In this exemple we're gonna use the file created previously.
# Opening the file and reading his data
nc <- open.nc(NC_FILE_PATH)
# Extract time and pr_mm values from group with name 'index_1'
groupe <- grp.inq.nc(nc,"index_1")$self
time_values <- read.nc(groupe)$time
pr_values <- read.nc(groupe)$pr_mm
# Extract attributes from group with name 'index_1'
groupe <- grp.inq.nc(nc,"index_1")$self
# Get group global attributes
lat <- att.get.nc(groupe,"NC_GLOBAL","latitude")
lon <- att.get.nc(groupe,"NC_GLOBAL","longitude")
# Get time attribute
time_units <- att.get.nc(groupe, "time", "units")
# Extract dimension from group with name 'index_1'
groupe <- grp.inq.nc(nc,"index_1")$self
dim <- dim.inq.nc(groupe, "time")
lenght_dim <- dim$length # Lenght of dim
name_dim <- dim$name # Name of dim
# Close the file NetCDF file
close.nc(nc)
The goal of these code is to get the variable groupe
to contains a NetCDF group pointer. With this pointer, you can get every information you need about your group.
Firstly, we want to browse into all of our groups to get the groupe
pointer and then (for the exemple) display their name (using for loop). To browse inside, we are going to use the list all groups given by grp.inq.nc(nc)
and use an index. In summary, given a file in NetCDF, we can use the for loop to extract a characteristic of my group (in this case, the name) from its place in the list of all groups.
# Opening the NetCDF file
nc <- open.nc(NC_FILE_PATH)
# Print all subgroup name from index group
nc_global <- grp.inq.nc(nc)
for (grp_index in 1:length(nc_global$grps)){
groupe <- nc_global$grps[[grp_index]] # 'groupe' is define by getting the grp_index group from the list of all group
print(grp.inq.nc(groupe)$name) # display the group name with the grp.inq.nc()
}
If you want to apply the for loop for all groups and subgroups in NetCDF files, you can use a recursive function.
This recursive function displays all groups and subgroups names.
list_groups_recursively <- function(grp) {
# Display group name
print(grp.inq.nc(grp)$name)
# If group have subgroups then use function to display all subgroups names
sub_groups <- grp.inq.nc(grp)$grps
if (length(sub_groups) > 0) {
for (sub_group in sub_groups) {
list_groups_recursively(sub_group)
}
}
}
# Opening the NetCDF file
nc <- open.nc(NC_FILE_PATH)
# Inquire about NetCDF nc
nc_global <- grp.inq.nc(nc)
# application of the function
list_groups_recursively(nc_global)
# Close the file NetCDF file
close.nc(nc)
Now, we know our group name and we want to get the groupe
pointer. To do so, we are going to use this line grp.inq.nc(nc,name)$self
to get the pointer to group infos.
# Opening the NetCDF file
nc <- open.nc(NC_FILE_PATH)
# Reading the NetCDF file
nc_files <- read.nc(nc, recursive=TRUE)
# Names of groupe from nc_files (list)
names <- names(nc_files)
# For loop that retrieves the NetCDF file group pointer from its name once it has been read by the read.nc() function and transformed into a list of variables
for(name in names){
groupe <- grp.inq.nc(nc,name)$self
print(groupe)
}
# Close the file NetCDF file
close.nc(nc)