Search code examples
rdplyrtidyverselag

Is there a possibility to lag values of a data frame in r indexed by time?


My questions concerns lagging data in r where r should be aware of the time index. I hope the question has not been asked in any further thread. Lets consider a simple setup:

df <- data.frame(date=as.Date(c("1990-01-01","1990-02-01","1990-01-15","1990-03-01","1990-05-01","1990-07-01","1993-01-02")), value=1:7)

This code should generate a table like

date value
1990-01-01 1
1990-02-01 2
1990-01-15 3
1990-03-01 4
1990-05-01 5
1990-07-01 6

And my aim is now to try to lag the "value" by e.g. one month such that e.g when I try to compute the lagged value of "1990-05-01" (which would be 1990-04-01 but is not present in the data) should then generate an NA in the specific row. When I use the standard lag function r is not aware of the time index and simply uses the value "4" of 1990-03-01 which is not what I want. Has anyone an idea what I could do here?

Thanks in advance! :)

All the best,

Leon


Solution

  • You can try %m-% for lagged month like below

    library(lubridate)
    transform(
      df,
      value_lag = value[match(date %m-% months(1), date)]
    )
    

    which gives

            date value value_lag
    1 1990-01-01     1        NA
    2 1990-02-01     2         1
    3 1990-01-15     3        NA
    4 1990-03-01     4         2
    5 1990-05-01     5        NA
    6 1990-07-01     6        NA
    7 1993-01-02     7        NA