Search code examples
rxtslag

Lag time series with new rows


I would like to lead/lag variable, but I don't want to lose any points.

For example: x (xts object):

2017-06-30  1    
2017-07-31  2
2017-08-31  3
2017-09-30  4
2017-10-31  5
2017-11-30  6
2017-12-31  7
2018-01-31  8
2018-02-28  9
2018-03-31  10

When I do

lag(x,2)

I get:

2017-06-30  NA
2017-07-31  NA
2017-08-31  1
2017-09-30  2
2017-10-31  3
2017-11-30  4
2017-12-31  5
2018-01-31  6
2018-02-28  7
2018-03-31  8

I lost 9 for 2018-04-30 and 10 for 2018-05-31. Is there a way to keep them? I know it is a problem with finding additional dates (2018-04-30 and 2018-05-31), but maybe there is a way to have it done automatically.


Solution

  • Somewhat prolix, but this does the job. This requires dplyr and magrittr.

    # Original data frame
    df <- data.frame(date = seq(as.Date("2017-07-01"), length=10, by="1 month") - 1, n = 1:10)
    
    #          date  n
    # 1  2017-06-30  1
    # 2  2017-07-31  2
    # 3  2017-08-31  3
    # 4  2017-09-30  4
    # 5  2017-10-31  5
    # 6  2017-11-30  6
    # 7  2017-12-31  7
    # 8  2018-01-31  8
    # 9  2018-02-28  9
    # 10 2018-03-31 10
    

    Next, I define the lag length:

    # Length of lag
    lag_length <- 2
    

    Here, I create the extra rows to be added:

    # Extra rows to add
    extra <- data.frame(date = (seq(tail(df$date, 1) + 1, length = lag_length + 1, by = "1 month") - 1)[-1], n = NA)
    

    Finally, I bind them to the original data frame and lag the variable n:

    # Bind extra rows and lag 'n' by 'lag_length'
    df %<>%
      bind_rows(extra) %>% 
      mutate(n = lag(n, lag_length))
    
    # New data frame
    #          date  n
    # 1  2017-06-30 NA
    # 2  2017-07-31 NA
    # 3  2017-08-31  1
    # 4  2017-09-30  2
    # 5  2017-10-31  3
    # 6  2017-11-30  4
    # 7  2017-12-31  5
    # 8  2018-01-31  6
    # 9  2018-02-28  7
    # 10 2018-03-31  8
    # 11 2018-04-30  9
    # 12 2018-05-31 10