Search code examples
rtime-serieslagdata-manipulation

lag and summarize time series data


I have spent a significant amount of time searching for an answer with little luck. I have some time series data and need to collapse and create a rolling mean of every nth row in that data. It looks like this is possible in zoo and maybe hmisc and i am sure other packages. I need to average rows 1,2,3 then 3,4,5 then 5,6,7 and so on. my data looks like such and has thousands of observations:

id   time    x.1    x.2    y.1    y.2
10     1     22     19     0      -.5
10     2     27     44     -1     0 
10     3     19     13     0      -1.5
10     4     7      22     .5     1
10     5     -15    5      .33    2
10     6     3      17     1      .33
10     7     6      -2     0      0
10     8     44     25     0      0
10     9     27     12     1      -.5
10     10    2      11     2      1

I would like it to look like this when complete:

id   time     x.1     x.2     y.1     y.2
10    1       22.66   25.33   -.33    -.66
10    2       3.66    13.33   .27     .50

The time var 1 would actually be times 1,2,3 averaged and 2 would be 3,4,5 averaged but at this point the time var would not be important to keep. I would need to group by id as it does change eventually. The only way I could figure out how to do this successfully was to use Lag() and make new rows lead by 1 and another by 2 then take average across columns. after that you have to delete every other row

  1  NA NA
  2  1  NA
  3  2  1
  4  3  2
  5  4  3

use the 123 and 345 and remove 234... to do this for each var would be outrageous especially as i gather new data.

any ideas? help would be much appreciated


Solution

  • something like this maybe?

    # sample data
    id <- c(10,10,10,10,10,10)
    time <- c(1,2,3,4,5,6)
    x1 <- c(22,27,19,7,-15,3)
    x2 <- c(19,44,13,22,5,17)
    
    df <- data.frame(id,time,x1,x2)
    
    means <- data.frame(rollmean(df[,c(1,3:NCOL(df))], 3))
    
    means <- means[c(T,F),]
    means$time <- seq(1:NROW(means))
    row.names(means) <- 1:NROW(means)
    
    > means
      id        x1       x2 time
    1 10 22.666667 25.33333    1
    2 10  3.666667 13.33333    2