I am having issues with pipes inside a custom function. Based on the previous posts, I understand that a pipe inside a function creates another level(?) which results in the error I'm getting (see below).
I'm hoping to write a summary function for a large data set with hundreds of numeric and categorical variables. I would like to have the option to use this on different data frames (with similar structure), always group by a certain factor variable and get summaries for multiple columns.
library(tidyverse)
data(iris)
iris %>% group_by(Species) %>% summarise(count = n(), mean = mean(Sepal.Length, na.rm = T))
# A tibble: 3 x 3
Species count mean
<fct> <int> <dbl>
1 setosa 50 5.01
2 versicolor 50 5.94
3 virginica 50 6.59
I'm hoping to create a function like this:
sum_cols <- function (df, col) {
df %>%
group_by(Species) %>%
summarise(count = n(),
mean = mean(col, na.rm = T))
}
And this is the error I'm getting:
sum_cols(iris, Sepal.Length)
Error in mean(col, na.rm = T) : object 'Petal.Width' not found
Called from: mean(col, na.rm = T)
I have had this problem for a while and even though I tried to get answers in a few previous posts, I haven't quite grasped why the problem occurs and how to get around it.
Any help would be greatly appreciated, thanks!
Try searching for non-standard evaluation (NSE).
You can use here {{}}
to let R know that col
is the column name in df
.
library(dplyr)
library(rlang)
sum_cols <- function (df, col) {
df %>%
group_by(Species) %>%
summarise(count = n(), mean = mean({{col}}, na.rm = T))
}
sum_cols(iris, Sepal.Length)
# A tibble: 3 x 3
# Species count mean
# <fct> <int> <dbl>
#1 setosa 50 5.01
#2 versicolor 50 5.94
#3 virginica 50 6.59
If we do not have the latest rlang
we can use the old method of enquo
and !!
sum_cols <- function (df, col) {
df %>%
group_by(Species) %>%
summarise(count = n(), mean = mean(!!enquo(col), na.rm = T))
}
sum_cols(iris, Sepal.Length)