Search code examples
rellipsis

R - Passing column name through ellipsis in R


I have a dataframe that looks like this

df = data.frame(id = 1:10, wt = 71:80, gender = rep(1:2, 5), race = rep(1:2, 5))

I'm trying to write a function that takes on a dataframe as a first argument together with any number of arguments that represent column names in that dataframe and use these column names to perform operations on the dataframe. My function would look like this:

library(dplyr)
myFunction <- function(df, ...){
 columns <- list(...)
  for (i in 1:length(columns)){
   var <- enquo(columns[[i]])
   df <- df %>% group_by(!!var)
  }
 df2 = summarise(df, mean = mean(wt))
 return(df2)
}

I call the function as the following

myFunction(df, race, gender)

However, I get the following error message:

Error in myFunction(df, race, gender) : object 'race' not found

Solution

  • We can convert the elements in ... to quosures and then do the evaluation (!!!)

    myFunction <- function(dat, ...){
        columns <- quos(...) # convert to quosures
    
     dat %>% 
         group_by(!!! columns) %>% # evaluate 
         summarise(mean = mean(wt))
    
     }
    
    myFunction(df, race, gender)
    # A tibble: 2 x 3
    # Groups:   race [?]
    #   race gender  mean
    #  <int>  <int> <dbl>
    #1     1      1    75
    #2     2      2    76
    
    myFunction(df, race)
    # A tibble: 2 x 2
    #   race  mean
    #  <int> <dbl>
    #1     1    75
    #2     2    76
    

    NOTE: In the OP's example, 'race' and 'gender' are the same

    If it change it, will see the difference

    df <- data.frame(id = 1:10, wt = 71:80, gender = rep(1:2, 5), 
          race = rep(1:2, each = 5))
    
    myFunction(df, race, gender)
    myFunction(df, race)
    myFunction(df, gender)
    

    If we decide to pass the arguments as quoted strings, then we can make use of group_by_at

    myFunction2 <- function(df, ...) {
     columns <- c(...)
     df %>% 
       group_by_at(columns) %>%
       summarise(mean= mean(wt))
    
     }
    
    myFunction2(df, "race", "gender")