r dplyr tidyverse user-defined-functions tidyeval

How to get specific values out of a list of values passed to one argument of a UDF with tidyeval

I used tidyeval to write a short function which takes grouping variables as an input, groups the mtcars dataset and counts the number of occurences per group:

test_function <- function(grps){
  mtcars %>% 
    group_by(across({{grps}})) %>% 
    summarise(Count = n())
}

test_function(grps = c(cyl, gear))

---

    cyl  gear Count
  <dbl> <dbl> <int>
1     4     3     1
2     4     4     8
3     4     5     2
4     6     3     2
5     6     4     4
6     6     5     1
7     8     3    12
8     8     5     2

Now imagine for that example I want a subtotal column for each group cyl. So how many cars have 4 (6,8) cylinders? This is what the result should look like:

test_function(grps = c(cyl, gear), subtotalrows = TRUE)    ### example function execution

---

          cyl     gear Count
        <dbl>    <dbl> <int>
 1        4        3     1
 2        4        4     8
 3        4        5     2
 4        4    total    11
 5        6        3     2
 6        6        4     4
 7        6        5     1
 8        6    total     7
 9        8        3    12
10        8        5     2
11        8    total    14

In this case the subtotal columns I am looking for can simply be produced with the same function but with one less grouping variable:

test_function(grps = cyl)

---

    cyl Count
  <dbl> <int>
1     4    11
2     6     7
3     8    14

But since I don't want to use the function in itself (not even sure wether this is possible in R) I would like to go for a different approach: As far as I know the best (and only way) to create subtotal rows so far is by calculating them independently and then binding them row wise to the grouped table (i.e.: rbind, bind_rows). In my case that means only take the first grouping variable, create the subtotal rows and later on bind them to the table. But here is where I have problems with the tidyeval syntax. Here is in pseudocode what I would like to do in the function:

test_function <- function(grps, subtotalrows = TRUE){
  
  
  grouped_result <- mtcars %>% 
    group_by(across({{grps}})) %>% 
    summarise(Count = n())
  
  if(subtotalrows == FALSE){
    
    return(grouped_result)
    
  } else {
    
    #pseudocode
    
    group_for_subcalculation <- grps[[1]] #I want the first element of the grps argument
    
    subtotal_result <- mtcars %>% 
      group_by(across({{group_for_subcalculation}})) %>% 
      summarise(Count = n()) %>% 
      mutate(grps[[2]] := "total") %>% 
      arrange(grps[[1]], grps[[2]], Count)
  
    return(rbind(grouped_result, subtotal_result))
  }
}

So, two questions: I am curious how I can extract the first column name passed by grps and work with it in the following code. Second, this pseudocode example is specific for 2 columns passed by grps. Imagine I want to pass 3 or more even. How would you do that (loops)?

Solution

Try this function -

library(dplyr)

test_function <- function(grps, subtotalrows = TRUE){
  grouped_data <- mtcars %>%  group_by(across({{grps}}))
  groups <- group_vars(grouped_data)
  col_to_change <- groups[length(groups)] #Last value in grps
  grouped_result <- grouped_data %>% summarise(Count = n())
  
  if(!subtotalrows) return(grouped_result)
  else {
    result <- grouped_result %>%
      summarise(Count = sum(Count), 
                !!col_to_change := 'Total') %>%
      bind_rows(grouped_result %>%
                  mutate(!!col_to_change := as.character(.data[[col_to_change]]))) %>%
      select(all_of(groups), Count) %>%
      arrange(across(all_of(groups)))
  }
  return(result)
}

Test the function -

test_function(grps = c(cyl, gear))

# A tibble: 11 x 3
#     cyl gear  Count
#   <dbl> <chr> <int>
# 1     4 3         1
# 2     4 4         8
# 3     4 5         2
# 4     4 Total    11
# 5     6 3         2
# 6     6 4         4
# 7     6 5         1
# 8     6 Total     7
# 9     8 3        12
#10     8 5         2
#11     8 Total    14

test_function(grps = c(cyl, gear), FALSE)

#    cyl  gear Count
#  <dbl> <dbl> <int>
#1     4     3     1
#2     4     4     8
#3     4     5     2
#4     6     3     2
#5     6     4     4
#6     6     5     1
#7     8     3    12
#8     8     5     2

For 3 variables -

test_function(grps = c(cyl, gear, carb))

#    cyl  gear carb  Count
#   <dbl> <dbl> <chr> <int>
# 1     4     3 1         1
# 2     4     3 Total     1
# 3     4     4 1         4
# 4     4     4 2         4
# 5     4     4 Total     8
# 6     4     5 2         2
# 7     4     5 Total     2
# 8     6     3 1         2
# 9     6     3 Total     2
#10     6     4 4         4
#11     6     4 Total     4
#12     6     5 6         1
#13     6     5 Total     1
#14     8     3 2         4
#15     8     3 3         3
#16     8     3 4         5
#17     8     3 Total    12
#18     8     5 4         1
#19     8     5 8         1
#20     8     5 Total     2