Search code examples
rsubsetstat

Data aggregation by week or by 3 days


Here is an example of my data:

 Date        Prec       aggregated by week (output)

        1/1/1950    3.11E+00    4.08E+00

        1/2/1950    3.25E+00    9.64E+00

        1/3/1950    4.81E+00    1.15E+01

        1/4/1950    7.07E+00    

        1/5/1950    4.25E+00    

        1/6/1950    3.11E+00    

        1/7/1950    2.97E+00    

        1/8/1950    2.83E+00    

        1/9/1950    2.72E+00    

        1/10/1950   2.72E+00    

        1/11/1950   2.60E+00    

        1/12/1950   2.83E+00    

        1/13/1950   1.70E+01    

        1/14/1950   3.68E+01    

        1/15/1950   4.24E+01    

        1/16/1950   1.70E+01    

        1/17/1950   7.07E+00    

        1/18/1950   3.96E+00    

        1/19/1950   3.54E+00    

        1/20/1950   3.40E+00    

        1/21/1950   3.25E+00

I have long time series precipitation data and I want to aggregate it in such a way that (output is in third column; I calculated it from excel) is as follows

If I do aggregation by weekly output in 1st cell = average prec from day 1 to 7 days. output in 2nd cell = average prec from 8 to 14 days. Output in 3rd cell=average prec from 15 to 21 day

If I do aggregation by 3 days output in 1st cell = average of day 1 to 3 days. output in 2nd cell = average of day 4 to 6 days.

I will provide the function with "prec" and the "time step" input. I tried loops and lubridate, POSIXct, and some other functions, but I cant figure out the output like in third column.

One code I came up with ran without error but my output is bot correct. Where dat is my data set.

  tt=as.POSIXct(paste(dat$Date),format="%m/%d/%Y") #converting date formate 
     datZoo <- zoo(dat[,-c(1,3)], tt)
     weekly <- apply.weekly(datZoo,mean)
    prec_NLCD <-data.frame (weekly)

Also I wanted to write it in form of a function. Your suggestions will be helpful.


Solution

  • Assuming the data shown reproducibly in the Note at the end create the weekly means, zm, and then merge it with z.

    (It would seem to make more sense to merge the means at the point that they are calculated, i.e. merge(z, zm) in place of the line marked ##, but for consistency with the output shown in the question they are put at the head of the data below.)

    library(zoo)
    
    z <- read.zoo(text = Lines, header = TRUE, format = "%m/%d/%Y")
    
    zm <- rollapplyr(z, 7, by = 7, mean)
    merge(z, zm = zoo(coredata(zm), head(time(z), length(zm)))) ##
    

    giving:

                   z        zm
    1950-01-01  3.11  4.081429
    1950-01-02  3.25  9.642857
    1950-01-03  4.81 11.517143
    1950-01-04  7.07        NA
    1950-01-05  4.25        NA
    1950-01-06  3.11        NA
    1950-01-07  2.97        NA
    1950-01-08  2.83        NA
    1950-01-09  2.72        NA
    1950-01-10  2.72        NA
    1950-01-11  2.60        NA
    1950-01-12  2.83        NA
    1950-01-13 17.00        NA
    1950-01-14 36.80        NA
    1950-01-15 42.40        NA
    1950-01-16 17.00        NA
    1950-01-17  7.07        NA
    1950-01-18  3.96        NA
    1950-01-19  3.54        NA
    1950-01-20  3.40        NA
    1950-01-21  3.25        NA
    

    Note:

    Lines <- "Date        Prec       
        1/1/1950    3.11E+00   
        1/2/1950    3.25E+00   
        1/3/1950    4.81E+00   
        1/4/1950    7.07E+00    
        1/5/1950    4.25E+00    
        1/6/1950    3.11E+00    
        1/7/1950    2.97E+00    
        1/8/1950    2.83E+00    
        1/9/1950    2.72E+00    
        1/10/1950   2.72E+00    
        1/11/1950   2.60E+00    
        1/12/1950   2.83E+00    
        1/13/1950   1.70E+01    
        1/14/1950   3.68E+01    
        1/15/1950   4.24E+01    
        1/16/1950   1.70E+01    
        1/17/1950   7.07E+00    
        1/18/1950   3.96E+00    
        1/19/1950   3.54E+00    
        1/20/1950   3.40E+00    
        1/21/1950   3.25E+00"