Search code examples
rgeospatialnetcdfncdf4

How to apply a monthly mean to daily data in a 3d array, storing this to a 4d array?


I have daily geo-spatial data over a period of 2 years. The data is in two nc files, each containing 365 days' worth of data. I want to apply a calculation to the data, and then calculate the monthly mean of this for each 12 months of each year across each gridbox.

lons <- 584
lats <- 712
MyValues_ym <- array(0., c(lons, lats, 2, 12))
MyCalculation <- array(0., c(lons, lats, ts))  ## set up data array to fit dailycalculations 
MyMonths = c('01','02','03','04','05','06','07','08','09','10','11','12')
ts <- dim(P)[[3]]  ## set MyCalculation array coordinates the same as precipitation coordinates
## Dimension of precipitation coordinates: [584L, 712L, 365L]
## calculate MeanMonthly  #calculated monthly q0 
## Loop over months
for (m in 1:12) {
  MyValues_ym[, , y - 1982, m] <- apply(MyCalculation, c(2), mean)
}

This is giving the error:

Error in `[<-`(`*tmp*`, , , y - 1982, m, value = c(NA_real_, NA_real_,  : 
  subscript out of bounds

Any help would be greatly appreciated.

I have tried to use rowMeans, but this was not calculating the mean for the time dimension of my array. I am not sure how to calculate the mean for the particular dimension required.

I tried this with RowMeans:

    # Loop over months
    for (m in 1:12) {     
    my_months_yearly <- format(daterange, '%m') == MyMonths[m]    
    MyValues_ym[,,y-1982,m]=rowMeans(Q0d[,,my_months_yearly], na.rm = TRUE, dims = 2) 

EDIT

I have now made a MRE:

lons <- 10
lats <- 10


MyYearlyMonthlyData <- array(0., c(lons,lats,1,12)) ## set up array for monthly calculation 
MyCalculationYr1 <- array(runif(1:100), c(lons, lats, 365)) ## set up data array to fit dailycalculations 

## set up the date range and months for the loop
start = '1985-01-01'
end = '1985-12-31'
daterange = seq(as.Date(start), as.Date(end), "days")
MyMonths = c('01','02','03','04','05','06','07','08','09','10','11','12')
y=1985

## the loop to generate monthly values of each day of each month for the year

for (m in 1:12) { 
  blMM <- format(daterange,'%m') == MyMonths[m]
  MyYearlyMonthlyData[,,y-1984,m] = rowMeans(MyCalculationYr1[,,blMM],na.rm=TRUE,dims=2) 
}  


Solution

  • The precipitation data is from the TAMSAT archive (as per your comment under your post). I quickly downloaded a file and it conforms to the CF Metadata Conventions, as is common for meteorological data distributed through NetCDF files. You can check that easily by checking for the "Conventions" global attribute:

    nc <- nc_open("./rfe_file.nc")
    ncatt_get(nc, "")$Conventions
    

    These files typically contain 3-d arrays of data, as is the case here. Dimensions are lon, lat and time. You can easily work with the time dimension using the CFtime package:

    library(CFtime)
    
    # Create a CFtime instance from the "time" dimension in your file
    cf <- CFtime(nc$dim$time$units, nc$dim$time$calendar, nc$dime$time$vals)
    
    # Create a factor over the months in your data file
    mon <- CFfactor(cf, "month")
    
    # Extract the data from the file
    data <- ncvar_get(nc, "ref_filled")
    
    # Aggregate to monthly mean (mm/day)
    pr <- apply(data, 1:2, tapply, mon, mean)
    
    # Re-arrange the dimension after tapply() mixed them up
    pr <- aperm(pr, c(2, 3, 1))
    
    # Label the dimensions
    dimnames(pr) <- list(nc$dim$lon$vals, nc$dim$lat$vals, levels(mon))
    nc_close(nc)
    

    Note that you will now have a 3-d array with the 3rd dimension having 24 elements with labels such as "2021-01", "2021-02" ... "2022-12". The apply() function applied the function tapply() over the first two dimensions, with tapply() applying the mean() function using the factor mon, so in the result every grid cell has the monthly mean values for the two years.