Search code examples
rlag

Are there simple ways to lag (by group) in data frames without workarounds like data tables, xts, zoo, dplyr etc in R?


Whenever I want to lag in a data frame I realize that something that should be simple is not. While the problem has been asked & answered many times (see p.s.), I did not find a simple solution which I can remember until the next time I lag. In general, lagging does not seem to be a simple thing in R as the multiple workarounds testify. I run into this problem often and it would be very helpful to have some basic R solutions which do not need extra packages. Could you provide your simple solution for lagging?

If that is not possible, could you at least provide your workaround here so we can choose amongst second best alternatives? One collection already exists here

Also, in all blog posts on this subject I see people complain about how unexpectedly difficult lagging is so how can we get a simple lag function for data frames into R Core? This must be extremely disappointing for anyone coming from Stata or EViews. Or am I missing something and there is a simple built in solution?

say we want to lag "value" by 3 "year"s for each "country" here:

Data <- data.frame(year=c(rep(2010:2015,2)),country=c(rep("AT",6),rep("DE",6)),value=rnorm(12))

to create L3 like:

 year country   value    L3
 2010      AT  0.3407    NA
 2011      AT -1.7981    NA
 2012      AT -0.8390    NA
 2013      AT -0.6888    0.3407
 2014      AT -1.1019   -1.7981
 2015      AT -0.8953   -0.8390
 2010      DE  0.5877    NA
 2011      DE -1.0204    NA
 2012      DE -0.6576    NA
 2013      DE  0.6620    0.5877
 2014      DE  0.9579   -1.0204
 2015      DE -0.7774   -0.6576

And we neither want to change the nature of our data (to ts or data table) nor do we want to immerse ourselves in three new packages when the deadline is tonight and our supervisor uses Stata and thinks lagging is easy ;-) (its not, I just want to be prepared...)

p.s.:

without groups

with data.table: Lag in dataframe or How to create a lag variable within each group?

time series are straightforward


Solution

  • If the question is how to provide a column with the prior third year's value not using packages then try this:

    prior_year3 <- function(x, k = 3) head(c(rep(NA, k), x), length(x))
    transform(Data, prior_year_value = ave(value, country, FUN = prior_year3))
    

    giving:

       year country       value prior_year_value
    1  2010      AT -1.66562121               NA
    2  2011      AT -0.04950063               NA
    3  2012      AT  1.55930293               NA
    4  2013      AT -0.40462394      -1.66562121
    5  2014      AT  0.78602610      -0.04950063
    6  2015      AT  0.73912916       1.55930293
    7  2010      DE  1.03710539               NA
    8  2011      DE -1.13370942               NA
    9  2012      DE -1.20530981               NA
    10 2013      DE  1.66870572       1.03710539
    11 2014      DE  1.53615793      -1.13370942
    12 2015      DE -0.09693335      -1.20530981
    

    That said, to use R effectively you do need to learn how to use the key packages.