Search code examples
rplyrzoorollapply

Sum column values over a window and report the values of the previous window


I´m having a data.frame of the following form:

ID  Var1   
1    1 
1    1
1    3
1    4
1    1
1    0
2    2
2    2
2    6
2    7
2    8
2    0
3    0
3    2
3    1
3    3
3    2
3    4

and I would like to get there:

ID  Var1  X  
1    1    0
1    1    0
1    3    0
1    4    5
1    1    5
1    0    5
2    2    0 
2    2    0
2    6    0
2    7    10
2    8    10
2    0    10
3    0    0
3    2    0
3    1    0
3    3    3
3    2    3
3    4    3

so in words: I´d like to calculate the sum of the variable in a window = 3, and then report the results obtained in the previous window. This should happen with respect to the IDs and thus the first three observations on every ID should be returned with 0, as there is no previous time period that could be reported. For understanding: In the actual dataset each row corresponds to one week and the window = 7. So X is supposed to give information on the sum of Var1 in the previous week.

I have tried using some rollapply stuff, but always ended in an error and also the window would be a rolling window if I got that right, which is specifically not what I need.

Thanks for your answers!


Solution

  • In rollapply, the width argument can be a list which provides the offsets to use. In this case we want to use the points 3, 2 and 1 back for the first point, 4, 3 and 2 back for the second, 5, 4 and 3 back for the third and then recycle. That is, for a window width of k = 3 we would want the following list of offset vectors:

    w <- list(-(3:1), -(4:2), -(5:3))
    

    In general we can write w below in terms of the window width k. ave then invokes rollapply with that width list for each ID.

    library(zoo)
    
    k <- 3
    w <- lapply(1:k, function(x) seq(to = -x, length = k))
    transform(DF, X = ave(Var1, ID, FUN = function(x) rollapply(x, w, sum, fill = 0)))
    

    giving:

       ID Var1  X
    1   1    1  0
    2   1    1  0
    3   1    3  0
    4   1    4  5
    5   1    1  5
    6   1    0  5
    7   2    2  0
    8   2    2  0
    9   2    6  0
    10  2    7 10
    11  2    8 10
    12  2    0 10
    13  3    0  0
    14  3    2  0
    15  3    1  0
    16  3    3  3
    17  3    2  3
    18  3    4  3
    

    Note

    The input DF in reproducible form is:

    DF <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
      2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), Var1 = c(1L, 1L, 3L, 4L, 1L, 
      0L, 2L, 2L, 6L, 7L, 8L, 0L, 0L, 2L, 1L, 3L, 2L, 4L)), 
      class = "data.frame", row.names = c(NA, -18L))