I have a function that takes a data frame and a couple of variables and I want it to produce a set of lagged variables using tidy evaluation principles. In it's simple form it looks like this:
library(dplyr)
cor_lags <- function(df, var1, var2) {
var1 <- enquo(var1)
var2 <- enquo(var2)
df %>%
select(!!var1, !!var2) %>%
mutate(lag1 = lag(!!var2, 1),
lag2 = lag(!!var2, 2),
lag3 = lag(!!var2, 3),
lag4 = lag(!!var2, 4),
lag5 = lag(!!var2, 5),
lag6 = lag(!!var2, 6))
}
However, this produces NA
values for all of the lagged variables.
cor_lags(dts_wide,"P26","P1")
# A tibble: 24 x 8
P26 P1 lag1 lag2 lag3 lag4 lag5 lag6
<dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 84332. 2258. NA NA NA NA NA NA
2 63995. 2752. NA NA NA NA NA NA
3 86208. 10126. NA NA NA NA NA NA
4 103455. 3767. NA NA NA NA NA NA
5 160524. 12986. NA NA NA NA NA NA
6 306683. 3944. NA NA NA NA NA NA
7 599589. 3695. NA NA NA NA NA NA
8 642343. 6202. NA NA NA NA NA NA
9 482021. 8769. NA NA NA NA NA NA
10 220949. 5059. NA NA NA NA NA NA
Is there a reason that the !! evaluators are not working within the lag
call? They are clearly working in the select
call.
The expected behaviour of the above call should work in practice like this code (which works):
# Expected
cor_lags <- function(df, var1, var2) {
var1 <- enquo(var1)
var2 <- enquo(var2)
df %>%
select(!!var1, !!var2) %>%
mutate(lag1 = lag(P1, 1),
lag2 = lag(P1, 2),
lag3 = lag(P1, 3),
lag4 = lag(P1, 4),
lag5 = lag(P1, 5),
lag6 = lag(P1, 6))
}
And which produces, as expected:
cor_lags(dts_wide,"P26","P1")
# A tibble: 24 x 8
P26 P1 lag1 lag2 lag3 lag4 lag5 lag6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 84332. 2258. NA NA NA NA NA NA
2 63995. 2752. 2258. NA NA NA NA NA
3 86208. 10126. 2752. 2258. NA NA NA NA
4 103455. 3767. 10126. 2752. 2258. NA NA NA
5 160524. 12986. 3767. 10126. 2752. 2258. NA NA
6 306683. 3944. 12986. 3767. 10126. 2752. 2258. NA
7 599589. 3695. 3944. 12986. 3767. 10126. 2752. 2258.
8 642343. 6202. 3695. 3944. 12986. 3767. 10126. 2752.
9 482021. 8769. 6202. 3695. 3944. 12986. 3767. 10126.
10 220949. 5059. 8769. 6202. 3695. 3944. 12986. 3767.
You're mixing up quasi-quotation syntax. Either
replace enquo
with sym
(or rlang::sym
) to turn the string into a symbol,
cor_lags <- function(df, var1, var2) {
var1 <- sym(var1); # Turn string into symbol
var2 <- sym(var2); # Turn string into symbol
df %>%
select(!!var1, !!var2) %>%
mutate(lag1 = lag(!!var2, 1),
lag2 = lag(!!var2, 2),
lag3 = lag(!!var2, 3),
lag4 = lag(!!var2, 4),
lag5 = lag(!!var2, 5),
lag6 = lag(!!var2, 6))
}
cor_lags(mtcars, "mpg", "disp") %>% head() # var1, var2 as string
# mpg disp lag1 lag2 lag3 lag4 lag5 lag6
#1 21.0 160 NA NA NA NA NA NA
#2 21.0 160 160 NA NA NA NA NA
#3 22.8 108 160 160 NA NA NA NA
#4 21.4 258 108 160 160 NA NA NA
#5 18.7 360 258 108 160 160 NA NA
#6 18.1 225 360 258 108 160 160 NA
or supply unquoted expressions for var1
and var2
and turn them into quosures with enquo
cor_lags <- function(df, var1, var2) {
var1 <- enquo(var1) # Turn expression into quosure
var2 <- enquo(var2) # Turn expression into quosure
df %>%
select(!!var1, !!var2) %>%
mutate(lag1 = lag(!!var2, 1),
lag2 = lag(!!var2, 2),
lag3 = lag(!!var2, 3),
lag4 = lag(!!var2, 4),
lag5 = lag(!!var2, 5),
lag6 = lag(!!var2, 6))
}
cor_lags(mtcars, mpg, disp) %>% head() # var1, var2 as expressions
# mpg disp lag1 lag2 lag3 lag4 lag5 lag6
#1 21.0 160 NA NA NA NA NA NA
#2 21.0 160 160 NA NA NA NA NA
#3 22.8 108 160 160 NA NA NA NA
#4 21.4 258 108 160 160 NA NA NA
#5 18.7 360 258 108 160 160 NA NA
#6 18.1 225 360 258 108 160 160 NA