Search code examples
rdplyreval

Why does passing eval(substitute(x,...)) into group_by() lead to a tibble with a column name "eval(substitute(x,...))"?


When passing quoted variables as arguments in a function I use the combination of eval() and substitute(x, list(x = as.name(x))). Generally I've had no problems but I just encountered one when using the combination of eval and substitute in group_by. When running the code below, the result has a column named "eval(substitute(variable, list(x = as.name(variable))))". The name of the column is the not the evaluated argument variable as I'd expect.

library(tidyverse)

df <- data.frame(v1 = rep(LETTERS[1:3], 3),
                 v2 = seq(0, 8, 1))

f1 <-  function (data, variable) {
  
  data %>%
    group_by(eval(substitute(variable, list(variable = as.name(variable))))) %>%
    summarise(c1 = mean(v2))
  
}

f1(df, "v1")

# Result

# A tibble: 3 × 2
  `eval(substitute(variable, list(variable = as.name(variable))))`    c1
  <chr>                                                            <dbl>
1 A                                                                    3
2 B                                                                    4
3 C                                                                    5

This is not a problem if group_by is not used:

f3 <- function(data, variable){
  
  data %>%
   filter(eval(substitute(variable, list(variable = as.name(variable)))) == "A") %>%
   as_tibble()
}

f3(df, "v1")

# Result

# A tibble: 3 × 2
  v1       v2
  <chr> <dbl>
1 A         0
2 A         3
3 A         6

Can someone explain what is going on here? Why does the result when using group_by not have the evaluated argument variable as a column name but does the operation itself appear to have worked (i.e. the calculated means are correct). I have managed to get the desired/expected output using the function below but do not understand why f2 does give the desired output but f1 doesn't despite the operation occurring as expected.

f2 <-  function (data, variable) {
  
  eval(substitute(data %>%
                   group_by(variable) %>%
                   summarise(c1 = mean(v2)), list(variable = as.name(variable))))

}

f2(df, "v1")

# A tibble: 3 × 2
  v1       c1
  <chr> <dbl>
1 A         3
2 B         4
3 C         5

Thanks for taking the time to help!


Solution

  • If you are asking why the difference exists between the two cases there are two types of arguments:

    The first is used by group_by and the second is used by filter. Which is used in any particular case affects how to specify arguments.

    Note that these work. The first uses rlang constructs, the second makes use of the fact that even though group_by uses data masking, pick uses tidy select. The third makes use of the fact that the .by= argument of summarize uses tidy select.

    f1a <-  function (data, variable) {
      data %>%
        group_by(!!sym(variable)) %>%
        summarise(c1 = mean(v2))
    }
    f1a(df, "v1")
    
    f1b <-  function (data, variable) {
      data %>%
        group_by(pick(any_of(variable))) %>%
        summarise(c1 = mean(v2))
    }
    f1b(df, "v1")
    
    f1c <-  function (data, variable) {
      data %>%
        summarise(c1 = mean(v2), .by = any_of(variable))
    }
    f1c(df, "v1")