Search code examples
rdplyrtidyeval

Unexpected tidy eval behaviour in dplyr lag call


I have a function that takes a data frame and a couple of variables and I want it to produce a set of lagged variables using tidy evaluation principles. In it's simple form it looks like this:

library(dplyr)
cor_lags <- function(df, var1, var2) {
  var1 <- enquo(var1)
  var2 <- enquo(var2)
  df %>% 
    select(!!var1, !!var2) %>% 
    mutate(lag1 = lag(!!var2, 1),
           lag2 = lag(!!var2, 2),
           lag3 = lag(!!var2, 3),
           lag4 = lag(!!var2, 4),
           lag5 = lag(!!var2, 5),
           lag6 = lag(!!var2, 6))
}

However, this produces NA values for all of the lagged variables.

cor_lags(dts_wide,"P26","P1")
# A tibble: 24 x 8
       P26     P1 lag1  lag2  lag3  lag4  lag5  lag6 
     <dbl>  <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
 1  84332.  2258. NA    NA    NA    NA    NA    NA   
 2  63995.  2752. NA    NA    NA    NA    NA    NA   
 3  86208. 10126. NA    NA    NA    NA    NA    NA   
 4 103455.  3767. NA    NA    NA    NA    NA    NA   
 5 160524. 12986. NA    NA    NA    NA    NA    NA   
 6 306683.  3944. NA    NA    NA    NA    NA    NA   
 7 599589.  3695. NA    NA    NA    NA    NA    NA   
 8 642343.  6202. NA    NA    NA    NA    NA    NA   
 9 482021.  8769. NA    NA    NA    NA    NA    NA   
10 220949.  5059. NA    NA    NA    NA    NA    NA  

Is there a reason that the !! evaluators are not working within the lag call? They are clearly working in the select call.

The expected behaviour of the above call should work in practice like this code (which works):

# Expected
cor_lags <- function(df, var1, var2) {
  var1 <- enquo(var1)
  var2 <- enquo(var2)
  df %>% 
    select(!!var1, !!var2) %>% 
    mutate(lag1 = lag(P1, 1),
           lag2 = lag(P1, 2),
           lag3 = lag(P1, 3),
           lag4 = lag(P1, 4),
           lag5 = lag(P1, 5),
           lag6 = lag(P1, 6))
}

And which produces, as expected:

cor_lags(dts_wide,"P26","P1")
# A tibble: 24 x 8
       P26     P1   lag1   lag2   lag3   lag4   lag5   lag6
     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
 1  84332.  2258.    NA     NA     NA     NA     NA     NA 
 2  63995.  2752.  2258.    NA     NA     NA     NA     NA 
 3  86208. 10126.  2752.  2258.    NA     NA     NA     NA 
 4 103455.  3767. 10126.  2752.  2258.    NA     NA     NA 
 5 160524. 12986.  3767. 10126.  2752.  2258.    NA     NA 
 6 306683.  3944. 12986.  3767. 10126.  2752.  2258.    NA 
 7 599589.  3695.  3944. 12986.  3767. 10126.  2752.  2258.
 8 642343.  6202.  3695.  3944. 12986.  3767. 10126.  2752.
 9 482021.  8769.  6202.  3695.  3944. 12986.  3767. 10126.
10 220949.  5059.  8769.  6202.  3695.  3944. 12986.  3767.

Solution

  • You're mixing up quasi-quotation syntax. Either

    • replace enquo with sym (or rlang::sym) to turn the string into a symbol,

      cor_lags <- function(df, var1, var2) {
        var1 <- sym(var1);                              # Turn string into symbol
        var2 <- sym(var2);                              # Turn string into symbol
        df %>%
          select(!!var1, !!var2) %>%
          mutate(lag1 = lag(!!var2, 1),
                 lag2 = lag(!!var2, 2),
                 lag3 = lag(!!var2, 3),
                 lag4 = lag(!!var2, 4),
                 lag5 = lag(!!var2, 5),
                 lag6 = lag(!!var2, 6))
      }
      
      cor_lags(mtcars, "mpg", "disp") %>% head()        # var1, var2 as string
      #   mpg disp lag1 lag2 lag3 lag4 lag5 lag6
      #1 21.0  160   NA   NA   NA   NA   NA   NA
      #2 21.0  160  160   NA   NA   NA   NA   NA
      #3 22.8  108  160  160   NA   NA   NA   NA
      #4 21.4  258  108  160  160   NA   NA   NA
      #5 18.7  360  258  108  160  160   NA   NA
      #6 18.1  225  360  258  108  160  160   NA
      
    • or supply unquoted expressions for var1 and var2 and turn them into quosures with enquo

      cor_lags <- function(df, var1, var2) {
         var1 <- enquo(var1)                            # Turn expression into quosure
         var2 <- enquo(var2)                            # Turn expression into quosure
         df %>%
           select(!!var1, !!var2) %>%
           mutate(lag1 = lag(!!var2, 1),
                  lag2 = lag(!!var2, 2),
                  lag3 = lag(!!var2, 3),
                  lag4 = lag(!!var2, 4),
                  lag5 = lag(!!var2, 5),
                  lag6 = lag(!!var2, 6))
      }
      cor_lags(mtcars, mpg, disp) %>% head()            # var1, var2 as expressions
      #   mpg disp lag1 lag2 lag3 lag4 lag5 lag6
      #1 21.0  160   NA   NA   NA   NA   NA   NA
      #2 21.0  160  160   NA   NA   NA   NA   NA
      #3 22.8  108  160  160   NA   NA   NA   NA
      #4 21.4  258  108  160  160   NA   NA   NA
      #5 18.7  360  258  108  160  160   NA   NA
      #6 18.1  225  360  258  108  160  160   NA