Can anyone explain to me why unquote does not work in the following?
I want to pass on a (function) user-specified column name in a call to do
in version 0.7.4 of dplyr
. This does seem somewhat less awkward than the older standard evaluation approach using do_
. A basic (successful) example ignoring the fact that using do
here is very unnecessary would be something like:
sum_with_do <- function(D, x, ...) {
x <- rlang::ensym(x)
gr <- quos(...)
D %>%
group_by(!!! gr) %>%
do(data.frame(y=sum(.[[quo_name(x)]])))
}
D <- data.frame(group=c('A','A','B'), response=c(1,2,3))
sum_with_do(D, response, group)
# A tibble: 2 x 2
# Groups: group [2]
group y
<fct> <dbl>
1 A 3.
2 B 3.
The rlang::
is unnecessary as of dplyr 0.7.5 which now exports ensym
. I have included lionel's suggestion regarding using ensym
here rather than enquo
, as the former guarantees that the value of x
is a symbol (not an expression).
Unquoting not useful here (e.g. other dplyr examples), replacing quo_name(x)
with !! x
in the above produces the following error:
Error in ~response : object 'response' not found
As per the accepted response, the underlying reason is that do
does not evaluate the expression in the same environment that other dplyr functions (e.g. mutate
) use.
I did not find this to be abundantly clear from either the documentation or the source code (e.g. compare the source for mutate
and do
for data.frames and follow Alice down the rabbit hole if you wish), but essentially - and this is probably nothing new to most;
do
evaluates expressions in an environment whose parent is the calling environment, and attaches the current group (slice) of the data.frame to the symbol .
, and;See also Advanced R. 22. Evaluation for a description in terms of 'data masking'.
This is because of regular do()
semantics where there is no data masking apart from .
:
do(df, data.frame(y = sum(.$response)))
#> y
#> 1 6
do(df, data.frame(y = sum(.[[response]])))
#> Error: object 'response' not found
So you just need to capture the bare column name as a string and there is no need to unquote since there is no data masking:
sum_with_do <- function(df, x, ...) {
# ensym() guarantees that `x` is a simple column name and not a
# complex expression:
x <- as.character(ensym(x))
df %>%
group_by(...) %>%
do(data.frame(y = sum(.[[x]])))
}