Search code examples
rdplyrpipechaining

R 3.5.2: Pipe inside custom function - object 'column' not found


I am having issues with pipes inside a custom function. Based on the previous posts, I understand that a pipe inside a function creates another level(?) which results in the error I'm getting (see below).

I'm hoping to write a summary function for a large data set with hundreds of numeric and categorical variables. I would like to have the option to use this on different data frames (with similar structure), always group by a certain factor variable and get summaries for multiple columns.

library(tidyverse)
data(iris)

iris %>% group_by(Species) %>% summarise(count = n(), mean = mean(Sepal.Length, na.rm = T))

# A tibble: 3 x 3
  Species    count  mean
  <fct>      <int> <dbl>
1 setosa        50  5.01
2 versicolor    50  5.94
3 virginica     50  6.59

I'm hoping to create a function like this:

sum_cols <- function (df, col) { 
df %>% 
group_by(Species) %>% 
summarise(count = n(), 
mean = mean(col, na.rm = T)) 
}

And this is the error I'm getting:

sum_cols(iris, Sepal.Length)

Error in mean(col, na.rm = T) : object 'Petal.Width' not found
Called from: mean(col, na.rm = T)

I have had this problem for a while and even though I tried to get answers in a few previous posts, I haven't quite grasped why the problem occurs and how to get around it.

Any help would be greatly appreciated, thanks!


Solution

  • Try searching for non-standard evaluation (NSE).

    You can use here {{}} to let R know that col is the column name in df.

    library(dplyr)
    library(rlang)
    
    sum_cols <- function (df, col) { 
      df %>% 
        group_by(Species) %>% 
        summarise(count = n(), mean = mean({{col}}, na.rm = T)) 
      }
    
    sum_cols(iris, Sepal.Length)
    
    # A tibble: 3 x 3
    #  Species    count  mean
    #  <fct>      <int> <dbl>
    #1 setosa        50  5.01
    #2 versicolor    50  5.94
    #3 virginica     50  6.59
    

    If we do not have the latest rlang we can use the old method of enquo and !!

    sum_cols <- function (df, col) { 
       df %>% 
         group_by(Species) %>% 
         summarise(count = n(), mean = mean(!!enquo(col), na.rm = T)) 
    }
    
    sum_cols(iris, Sepal.Length)