Search code examples
rarraysncdf4

My array is too large to be allocated. What can I do to split it into smaller chunks? I cannot use split() as I cannot allocate a vector of 128.0 Gb


I have an array of length num [1:584, 1:712, 1:30, 1:365]. This array contains calculations from a model run. The array contains: [longitudes, latitudes, years, days].

I tried to save the data to an .nc file using ncvar_put but I got the error:

Error in ncvar_put(nca, my_variables, my_data_array) : long vectors not supported yet: ../include/Rinlinedfuns.h:537

So I need to split the array. Following google searches, I tried to use:

#clear all data frames from environment to save memory
rm(list=ls(all=TRUE)[sapply(mget(ls(all=TRUE)), class) == "data.frame"])

#splitting the my_data_array along 3 year periods (5y = 1095 days) 
chunk_length = 1825
split (Q0d_ym,
       ceiling(seq_along(Q0d_ym) / chunk_length))

which led to the error:

Error: cannot allocate vector of size 128.0 Gb.

MRE:

lons <- 583
lats <- 712
my_data_array <- array(0.,c(lons, lats, 30, 365))
#splitting the my_data_array along 3 year periods (5y = 1095 days) 
chunk_length = 1825
split (my_data_array,
       ceiling(seq_along(my_data_array) / chunk_length))


Any ideas would be gratefully received. I did try to split the array into even smaller chunks but I got the same error. I could run my model again in smaller chunks, but if there is a way to split my current array into smaller chunks which I can then save to an .nc file that would be preferable (particularly due to time taken to run the model).


Solution

  • Use the start and count arguments of ncvar_put to write out chunks in a for loop. This seems to be working for me:

    library(ncdf4)
    
    lons <- 583
    lats <- 712
    LAT <- runif(lats)
    LON <- runif(lons)
    my_data_array <- array(0, c(lons, lats, 30, 365))
    dimLatitude <- ncdim_def(name = 'lat', units = 'degrees_north', vals = LAT)
    dimLongitude <- ncdim_def(name = 'lon', units = 'degrees_east', vals = LON)
    dimTime <- ncdim_def(name = 'StdTime', units = 'days_since_1970', vals = 1:(30*365), unlim = TRUE)
    my_variable <- ncvar_def(name = 'variable', units = '', dim = list(dimLongitude,dimLatitude,dimTime), missval = -9.96920996838687e+36)
    nc <- nc_create("C:/test.nc", my_variable)
    
    for (i in 1:30) {
      ncvar_put(nc, my_variable, my_data_array[,,i,], start = c(1, 1, (i - 1L)*365L + 1L), count = c(lons, lats, 365))
    }
    
    nc_close(nc)