When passing quoted variables as arguments in a function I use the combination of eval()
and substitute(x, list(x = as.name(x)))
. Generally I've had no problems but I just encountered one when using the combination of eval
and substitute
in group_by
. When running the code below, the result has a column named "eval(substitute(variable, list(x = as.name(variable))))". The name of the column is the not the evaluated argument variable
as I'd expect.
library(tidyverse)
df <- data.frame(v1 = rep(LETTERS[1:3], 3),
v2 = seq(0, 8, 1))
f1 <- function (data, variable) {
data %>%
group_by(eval(substitute(variable, list(variable = as.name(variable))))) %>%
summarise(c1 = mean(v2))
}
f1(df, "v1")
# Result
# A tibble: 3 × 2
`eval(substitute(variable, list(variable = as.name(variable))))` c1
<chr> <dbl>
1 A 3
2 B 4
3 C 5
This is not a problem if group_by
is not used:
f3 <- function(data, variable){
data %>%
filter(eval(substitute(variable, list(variable = as.name(variable)))) == "A") %>%
as_tibble()
}
f3(df, "v1")
# Result
# A tibble: 3 × 2
v1 v2
<chr> <dbl>
1 A 0
2 A 3
3 A 6
Can someone explain what is going on here? Why does the result when using group_by
not have the evaluated argument variable
as a column name but does the operation itself appear to have worked (i.e. the calculated means are correct). I have managed to get the desired/expected output using the function below but do not understand why f2
does give the desired output but f1
doesn't despite the operation occurring as expected.
f2 <- function (data, variable) {
eval(substitute(data %>%
group_by(variable) %>%
summarise(c1 = mean(v2)), list(variable = as.name(variable))))
}
f2(df, "v1")
# A tibble: 3 × 2
v1 c1
<chr> <dbl>
1 A 3
2 B 4
3 C 5
Thanks for taking the time to help!
If you are asking why the difference exists between the two cases there are two types of arguments:
The first is used by group_by
and the second is used by filter
. Which is used in any particular case affects how to specify arguments.
Note that these work. The first uses rlang constructs, the second makes use of the fact that even though group_by
uses data masking, pick
uses tidy select. The third makes use of the fact that the .by=
argument of summarize
uses tidy select.
f1a <- function (data, variable) {
data %>%
group_by(!!sym(variable)) %>%
summarise(c1 = mean(v2))
}
f1a(df, "v1")
f1b <- function (data, variable) {
data %>%
group_by(pick(any_of(variable))) %>%
summarise(c1 = mean(v2))
}
f1b(df, "v1")
f1c <- function (data, variable) {
data %>%
summarise(c1 = mean(v2), .by = any_of(variable))
}
f1c(df, "v1")