Search code examples
rdplyrrlang

Multiple Lags with dplyr


I refer to the excellent post at

https://purrple.cat/blog/2018/03/02/multiple-lags-with-tidy-evaluation/

What I want to do is to create a function capable, à la dplyr, to generate new columns which are a lagged version of existing columns in a data frame. For instance, you can do this manually as

 library(dplyr)
 library(rlang)

 d2 <- tibble(x1 =1:10, x2=10:19,  x3=50:59)

 d3 <- d2%>%mutate(x1lag1=lag(x1, 1), x1lag2=lag(x1,2))

but this becomes quickly tedious when you need to take several lags of different columns. One solution in the link above is the following

lags <- function(var, n=10){
 var <- enquo(var)

  indices <- seq_len(n)
  map( indices, ~quo(lag(!!var, !!.x)) ) %>%
   set_names(sprintf("lag_%s_%02d", quo_text(var), indices))

 }


d4 <- d2 %>%
  mutate( !!!lags(x1, 3), !!!lags(x2,3) )

Does anybody know how this could be made more general? I mean that I would like to take a fixed number of lags of a list of columns (x1 and x2, for instance), just by passing the list of columns and without repeating the commands for x1 and x2.

Any suggestion is appreciated.


Solution

  • I think the idea would be to use ... instead of var, which would follow closest the spirit of your function.

    To do this, it required changing enquo() to enquos(), and I use here a crossing and map2, but there's probably a more elegant way to do so...

    library(tidyverse)
    library(rlang)
    #> 
    #> Attaching package: 'rlang'
    #> The following objects are masked from 'package:purrr':
    #> 
    #>     %@%, as_function, flatten, flatten_chr, flatten_dbl,
    #>     flatten_int, flatten_lgl, flatten_raw, invoke, list_along,
    #>     modify, prepend, splice
    
    d <- data_frame(x = seq_len(100),
                    y = rnorm(100))
    #> Warning: `data_frame()` is deprecated, use `tibble()`.
    #> This warning is displayed once per session.
    
    multijetlag <- function(data, ..., n=10){
      variable <- enquos(...)
    
      indices <- seq_len(n)
      combos <- crossing(indices, var =as.list(variable))
    
      quosures <- map2(combos$indices, combos$var,
                       ~quo(lag(!!.y, !!.x)) ) %>% 
        set_names(paste("lag", combos$indices, map_chr(combos$var, quo_text), sep = "_"))
      mutate( data, !!!quosures )
    
    }
    
    multijetlag(d, x, y, n=3)
    #> # A tibble: 100 x 8
    #>        x       y lag_1_x  lag_1_y lag_2_x  lag_2_y lag_3_x lag_3_y
    #>    <int>   <dbl>   <int>    <dbl>   <int>    <dbl>   <int>   <dbl>
    #>  1     1  0.213       NA  NA           NA  NA           NA  NA    
    #>  2     2  0.277        1   0.213       NA  NA           NA  NA    
    #>  3     3 -0.517        2   0.277        1   0.213       NA  NA    
    #>  4     4 -0.671        3  -0.517        2   0.277        1   0.213
    #>  5     5 -1.12         4  -0.671        3  -0.517        2   0.277
    #>  6     6 -0.296        5  -1.12         4  -0.671        3  -0.517
    #>  7     7 -1.18         6  -0.296        5  -1.12         4  -0.671
    #>  8     8  0.0582       7  -1.18         6  -0.296        5  -1.12 
    #>  9     9 -0.455        8   0.0582       7  -1.18         6  -0.296
    #> 10    10 -0.969        9  -0.455        8   0.0582       7  -1.18 
    #> # … with 90 more rows
    

    Created on 2019-04-23 by the reprex package (v0.2.1)