I have a function that calculates the means of a grouped database for a column which is chosen based on the content of a variable VarName
. The current function uses dplyr::summarize_
, but now I see this is deprecated, and I want to replace it before it is fully removed.
However, I'm not sure how to use the new unquoting to achieve what I'm trying to do. Here's my current code:
means<-summarize_(group_by(dat,Grade),.dots = setNames(paste0('mean(',VarName,',na.rm=TRUE)'),'means'))
I tried replacing the .dots
part with means=mean(!!VarName, na.rm=TRUE)
, but that just returned the string inside VarName. What I need is for the string in VarName to be evaluated as the column name within dat
, so that I'll get a column name "means" with the mean of each group. How can I achieve that with the new summarize
?
Sample dataset for reproducibility:
VarName<-"Things"
dat<-data.frame(students=c("a","b","c","d","e"),Grade=c(2,2,2,3,3),varA=c(41:45),Things=c(90,100,80,75,80))
Thanks!
Turning this into a function and generalizing for arbitrary data, grouping variable, and value variable:
library(tidyverse)
means <- function(data, group, value) {
group = enquo(group)
value = enquo(value)
value_name = paste0("mean_", value)[2]
data %>% group_by(!!group) %>%
summarise(!!value_name := mean(!!value, na.rm=TRUE))
}
means(dat, Grade, Things)
Grade mean_Things <dbl> <dbl> 1 2.00 90.0 2 3.00 77.5
If I understand your comment, how about the function below, which takes a string for the value
argument:
means <- function(data, group, value) {
group = enquo(group)
value_name = paste0("mean_", value)
value = sym(value)
data %>% group_by(!!group) %>%
summarise(!!value_name := mean(!!value, na.rm=TRUE))
}
VarName = "Things"
means(dat, Grade, VarName)
Grade mean_Things <dbl> <dbl> 1 2.00 90.0 2 3.00 77.5
Since the function is generalized, you can do this with any data frame. For example:
means(mtcars, cyl, "mpg")
cyl mean_mpg <dbl> <dbl> 1 4.00 26.7 2 6.00 19.7 3 8.00 15.1
You can generalize the function still further. For example, this version takes an arbitrary number of grouping columns:
means <- function(data, value, ...) {
group = quos(...)
value_name = paste0("mean_", value)
value = sym(value)
data %>% group_by(!!!group) %>%
summarise(!!value_name := mean(!!value, na.rm=TRUE))
}
VarName = "Things"
means(dat, VarName, students, Grade)
students Grade mean_Things <fct> <dbl> <dbl> 1 a 2.00 90.0 2 b 2.00 100 3 c 2.00 80.0 4 d 3.00 75.0 5 e 3.00 80.0