Search code examples
rdplyr

Using column names as function arguments


With a data frame, I'm using dplyr to aggregate some column like below.

> data <- data.frame(a=rep(1:2,3), b=c(6:11))
> data
  a  b
1 1  6
2 2  7
3 1  8
4 2  9
5 1 10
6 2 11
> data %>% group_by(a) %>% summarize(tot=sum(b))
# A tibble: 2 x 2
      a   tot
  <int> <int>
1     1    24
2     2    27

This is perfect. However I want to create a re-usable function for this such that a column name can be passed as argument.

Looking at answers to related questions like here, I tried the following.

sumByColumn <- function(df, colName) {
  df %>%
  group_by(a) %>%
  summarize(tot=sum(colName))
  df
}

However I'm not able to get it working.

> sumByColumn(data, "b")

 Error in summarise_impl(.data, dots) : 
  Evaluation error: invalid 'type' (character) of argument. 

> sumByColumn(data, b)

 Error in summarise_impl(.data, dots) : 
  Evaluation error: object 'b' not found. 
> 

Solution

  • This can work using the latest dplyr syntax (as can be seen on github):

    library(dplyr)
    library(rlang)
    sumByColumn <- function(df, colName) {
      df %>%
        group_by(a) %>%
        summarize(tot = sum(!! sym(colName)))
    }
    
    sumByColumn(data, "b")
    ## A tibble: 2 x 2
    #      a   tot
    #  <int> <int>
    #1     1    24
    #2     2    27
    

    And an alternative way of specifying b as a variable:

    library(dplyr)
    sumByColumn <- function(df, colName) {
      myenc <- enquo(colName)
      df %>%
        group_by(a) %>%
        summarize(tot = sum(!!myenc))
    }
    
    sumByColumn(data, b)
    ## A tibble: 2 x 2
    #      a   tot
    #  <int> <int>
    #1     1    24
    #2     2    27