Search code examples
raggregatenetcat

Aggregate nc file using specific condition in R


I need your help again. I have .nc file, metadata: File minty.nc (NC_FORMAT_64BIT):

 1 variables (excluding dimension variables):
    short mn2t[longitude,latitude,time]   
        scale_factor: 0.000940940342005054
        add_offset: 259.294916797895
        _FillValue: -32767
        missing_value: -32767
        units: K
        long_name: Minimum temperature at 2 metres since previous post-processing

 3 dimensions:
    longitude  Size:57
        units: degrees_east
        long_name: longitude
    latitude  Size:49
        units: degrees_north
        long_name: latitude
    time  Size:90240
        units: hours since 1900-01-01 00:00:00.0
        long_name: time
        calendar: gregorian

2 global attributes:
    Conventions: CF-1.6
    

I have a code, which works well with smaller .nc files:

 library(raster)
 library(rgdal)
 library(ggplot2)
 nc_data = nc_open('file.nc')
 lon = ncvar_get(nc_data, "longitude")
 lat = ncvar_get(nc_data, "latitude", verbose = F)
 t = ncvar_get(nc_data, "time")
 head(lon)
 head(t)
 head(lat)
 mint.array = ncvar_get(nc_data, "mn2t")
 dim(mint.array)
 fillvalue = ncatt_get(nc_data, "mn2t", "_FillValue")
 fillvalue
 mint.array[mint.array == fillvalue$value] <- NA
 r_brick <- brick(mint.array, xmn=min(lat), xmx=max(lat), ymn=min(lon), ymx=max(lon), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs+ towgs84=0,0,0"))
 r_brick = flip(t(r_brick), direction = 'y')

Because of the large file size, I got an error: "cannot allocate vector of size 1.4 Mb" I also used gc() to clear unused memory. It didn't help. I do not need all the data in my file.nc. In this case, I need somehow to aggregate this. For my further calculations, I need only daily minima. In this case, for df I used: df(ff) <- aggregate(df, list(rep(1:(nrow(df)%(%n+1), each=24, len=nrow(df))), min)

Unfortunately, I find it difficult to adapt this code for .nc file. Maybe someone could help me. Thank you in advance!


Solution

  • To avoid memory problems, you can do this instead:

    library(raster)
    r_brick <- brick('file.nc', "mn2t")
    

    It also prevents mistakes. For example, in your code, this is wrong in two ways:

    xmn=min(lat), xmx=max(lat), ymn=min(lon), ymx=max(lon)
    

    because x should be lon and y should be lat and because the ncdf coordinates refer to the centers of the cells, whereas xmn, xmx, ymn, and ymx refer to the borders.

    You can also use the modern equivalent

    library(terra)
    r <- rast('file.nc')
    

    And to get the minimum value for each day, you can do

    x <- tapp(r, "days", min)