Search code examples
rdplyrsummarizensequosure

Moving from deprecated summarize_ to new summarize in dplyr


I have a function that calculates the means of a grouped database for a column which is chosen based on the content of a variable VarName. The current function uses dplyr::summarize_, but now I see this is deprecated, and I want to replace it before it is fully removed.

However, I'm not sure how to use the new unquoting to achieve what I'm trying to do. Here's my current code:

means<-summarize_(group_by(dat,Grade),.dots = setNames(paste0('mean(',VarName,',na.rm=TRUE)'),'means'))

I tried replacing the .dots part with means=mean(!!VarName, na.rm=TRUE), but that just returned the string inside VarName. What I need is for the string in VarName to be evaluated as the column name within dat, so that I'll get a column name "means" with the mean of each group. How can I achieve that with the new summarize?

Sample dataset for reproducibility:

VarName<-"Things"
dat<-data.frame(students=c("a","b","c","d","e"),Grade=c(2,2,2,3,3),varA=c(41:45),Things=c(90,100,80,75,80))

Thanks!


Solution

  • Turning this into a function and generalizing for arbitrary data, grouping variable, and value variable:

    library(tidyverse)
    
    means <- function(data, group, value) {
    
      group = enquo(group)
      value = enquo(value)
      value_name = paste0("mean_", value)[2]
    
      data %>% group_by(!!group) %>% 
        summarise(!!value_name := mean(!!value, na.rm=TRUE))
    }
    
    means(dat, Grade, Things)
    
      Grade mean_Things
      <dbl>       <dbl>
    1  2.00        90.0
    2  3.00        77.5
    

    If I understand your comment, how about the function below, which takes a string for the value argument:

    means <- function(data, group, value) {
    
      group = enquo(group)
      value_name = paste0("mean_", value)
      value = sym(value)
    
      data %>% group_by(!!group) %>% 
        summarise(!!value_name := mean(!!value, na.rm=TRUE))
    }
    
    VarName = "Things"
    
    means(dat, Grade, VarName)
    
      Grade mean_Things
      <dbl>       <dbl>
    1  2.00        90.0
    2  3.00        77.5
    

    Since the function is generalized, you can do this with any data frame. For example:

    means(mtcars, cyl, "mpg")
    
        cyl mean_mpg
      <dbl>    <dbl>
    1  4.00     26.7
    2  6.00     19.7
    3  8.00     15.1
    

    You can generalize the function still further. For example, this version takes an arbitrary number of grouping columns:

    means <- function(data, value, ...) {
    
      group = quos(...)
      value_name = paste0("mean_", value)
      value = sym(value)
    
      data %>% group_by(!!!group) %>% 
        summarise(!!value_name := mean(!!value, na.rm=TRUE))
    }
    
    VarName = "Things"
    
    means(dat, VarName, students, Grade)
    
      students Grade mean_Things
      <fct>    <dbl>       <dbl>
    1 a         2.00        90.0
    2 b         2.00       100  
    3 c         2.00        80.0
    4 d         3.00        75.0
    5 e         3.00        80.0