Search code examples
rfunctiontidy

Selecting by Column Names within Function


I feel like I'm losing my mind. This should be working, right?

Selecting a variable that is being piped within a dataset. I get the error that Error: object 'cyl' not found

table_fun = function(data, grouping_var){
  data %>% 
    select(grouping_var)%>%
    group_by(grouping_var)%>%
    summarise(count = n())
}

table_fun(mtcars, cyl)

Solution

  • See https://dplyr.tidyverse.org/articles/programming.html and, in particular, https://dplyr.tidyverse.org/articles/programming.html#one-or-more-user-supplied-expressions

    tidyverse functions rely on a specific style of "non-standard evaluation" which lets you do things like referring to a column of a data frame by the column name, e.g. cyl instead of mtcars$cyl. When you're using those functions within other functions, you need to take some extra care. In many simple cases, wrapping the column name in double braces will allow you to pass a function parameter and use it to select a column by that name.

    table_fun <- function(data, grouping_var) {
      count(data, {{grouping_var}}, name = "count")
    }
    

    count(df, cyl, name = "count") is equivalent to df |> group_by(cyl) |> summarize(count = n()) or df |> summarize(count = n(), .by = cyl) (That last one requiring dplyr 1.1.0+ from 2023 or later.)


    In the comments, OP asked about the situation where we want to refer to a dynamically-created column with a subsequent function. See the "name injection" section here:

    library(gt)
    table_fun <- function(data, grouping_var, new_name) {
      count(data, {{grouping_var}}, name = "count") |>
        gt() |>
        cols_label( {{grouping_var}} := new_name)
    }
    
    table_fun(mtcars, cyl, "Cylinder")
    

    enter image description here