Search code examples
rfunctiondplyrlazy-evaluationsummary

Changing names of resulting variables in custom dplyr function


Background

In order to speed up generating grouped summaries across multiple tables; as I'm doing most of that while in dplyr workflow, I've drafted a simple function that generates the desired metrics

# Function to generate summary table
generate_summary_tbl <- function(dataset, group_column, summary_column) {
    group_column   <- enquo(group_column)
    summary_column <- enquo(summary_column)
    dataset %>% 
        group_by(!!group_column) %>% 
        summarise(
            mean = mean(!!summary_column),
            sum  = sum(!!summary_column)
            # Other metrics that need to be generated frequently
        ) %>% 
        ungroup -> smryDta
    return(smryDta)
}

Example

The function works as desired:

>> mtcars %>% 
...     generate_summary_tbl(group_column = am, summary_column = mpg)
# A tibble: 2 x 3
     am     mean   sum
  <dbl>    <dbl> <dbl>
1     0 17.14737 325.8
2     1 24.39231 317.1

Problem

I would like, conditionally include name of the column passed via summary_column = mpg in the results.

Example results, useColName = TRUE

When called with useColName = TRUE the results should correspond to:

>> mtcars %>% 
...     generate_summary_tbl(group_column = am, summary_column = mpg,
                             useColName = TRUE)
# A tibble: 2 x 3
     am     mean_am   sum_am
  <dbl>    <dbl>       <dbl>
1     0    17.14737    325.8
2     1    24.39231    317.1

The difference is presence of the _am suffix in the variable names mean_am and so on.

Ugly solution

Partial, ugly solution I have uses setNames:

# Function to generate summary table
generate_summary_tbl <-
    function(dataset,
             group_column,
             summary_column,
             useColName = TRUE) {
        group_column   <- enquo(group_column)
        summary_column <- enquo(summary_column)
        dataset %>%
            group_by(!!group_column) %>%
            summarise(mean = mean(!!summary_column),
                      sum  = sum(!!summary_column)) %>%
            ungroup -> smryDta

        if (useColName) {
            setNames(smryDta,
                     c(deparse(substitute(
                         group_column
                     )),
                     paste(
                         names(smryDta)[2:length(smryDta)], paste0("_", deparse(substitute(
                             group_column
                         )))
                     ))) -> smryDta
        }

        return(smryDta)
    }

Example

The returned column names, almost match the desired results. I reckon I could employ some regex and arrive at the desired results. However, I reckon that more efficient solutions should be available.

mtcars %>% 
    generate_summary_tbl(group_column = am, summary_column = mpg, useColName = TRUE)
# A tibble: 2 x 3
  `~am` `mean _~am` `sum _~am`
  <dbl>       <dbl>      <dbl>
1     0    17.14737      325.8
2     1    24.39231      317.1

How can I get desired column names, ideally making better use of quo or lazyeval?


Solution

  • Maybe use rename:

    library(tidyverse)
    
    generate_summary_tbl <- function(dataset, group_column, summary_column, useColname = FALSE) {
        group_column   <- enquo(group_column)
        summary_column <- enquo(summary_column)
        dataset %>% 
            group_by(!!group_column) %>% 
            summarise(
                mean = mean(!!summary_column),
                sum  = sum(!!summary_column)
                # Other metrics that need to be generated frequently
            ) %>% 
            ungroup -> smryDta
    
        if (useColname) 
          smryDta <- smryDta %>%  
          rename_at(
            vars(-one_of(quo_name(group_column))), 
            ~paste(quo_name(group_column), .x, sep="_")
          )
    
        return(smryDta)
    }
    
    mtcars %>% generate_summary_tbl(am, mpg)
    # # A tibble: 2 x 3
    #      am     mean   sum
    #   <dbl>    <dbl> <dbl>
    # 1     0 17.14737 325.8
    # 2     1 24.39231 317.1
    mtcars %>% generate_summary_tbl(am, mpg, T)
    #   # A tibble: 2 x 3
    #      am  am_mean am_sum
    #   <dbl>    <dbl>  <dbl>
    # 1     0 17.14737  325.8
    # 2     1 24.39231  317.1