Search code examples
rdplyrtidyverseuser-defined-functionstidyeval

How to get specific values out of a list of values passed to one argument of a UDF with tidyeval


I used tidyeval to write a short function which takes grouping variables as an input, groups the mtcars dataset and counts the number of occurences per group:

test_function <- function(grps){
  mtcars %>% 
    group_by(across({{grps}})) %>% 
    summarise(Count = n())
}

test_function(grps = c(cyl, gear))

---

    cyl  gear Count
  <dbl> <dbl> <int>
1     4     3     1
2     4     4     8
3     4     5     2
4     6     3     2
5     6     4     4
6     6     5     1
7     8     3    12
8     8     5     2

Now imagine for that example I want a subtotal column for each group cyl. So how many cars have 4 (6,8) cylinders? This is what the result should look like:

test_function(grps = c(cyl, gear), subtotalrows = TRUE)    ### example function execution

---

          cyl     gear Count
        <dbl>    <dbl> <int>
 1        4        3     1
 2        4        4     8
 3        4        5     2
 4        4    total    11
 5        6        3     2
 6        6        4     4
 7        6        5     1
 8        6    total     7
 9        8        3    12
10        8        5     2
11        8    total    14

In this case the subtotal columns I am looking for can simply be produced with the same function but with one less grouping variable:

test_function(grps = cyl)

---

    cyl Count
  <dbl> <int>
1     4    11
2     6     7
3     8    14

But since I don't want to use the function in itself (not even sure wether this is possible in R) I would like to go for a different approach: As far as I know the best (and only way) to create subtotal rows so far is by calculating them independently and then binding them row wise to the grouped table (i.e.: rbind, bind_rows). In my case that means only take the first grouping variable, create the subtotal rows and later on bind them to the table. But here is where I have problems with the tidyeval syntax. Here is in pseudocode what I would like to do in the function:

test_function <- function(grps, subtotalrows = TRUE){
  
  
  grouped_result <- mtcars %>% 
    group_by(across({{grps}})) %>% 
    summarise(Count = n())
  
  if(subtotalrows == FALSE){
    
    return(grouped_result)
    
  } else {
    
    #pseudocode
    
    group_for_subcalculation <- grps[[1]] #I want the first element of the grps argument
    
    subtotal_result <- mtcars %>% 
      group_by(across({{group_for_subcalculation}})) %>% 
      summarise(Count = n()) %>% 
      mutate(grps[[2]] := "total") %>% 
      arrange(grps[[1]], grps[[2]], Count)
  
    return(rbind(grouped_result, subtotal_result))
  }
}

So, two questions: I am curious how I can extract the first column name passed by grps and work with it in the following code. Second, this pseudocode example is specific for 2 columns passed by grps. Imagine I want to pass 3 or more even. How would you do that (loops)?


Solution

  • Try this function -

    library(dplyr)
    
    test_function <- function(grps, subtotalrows = TRUE){
      grouped_data <- mtcars %>%  group_by(across({{grps}}))
      groups <- group_vars(grouped_data)
      col_to_change <- groups[length(groups)] #Last value in grps
      grouped_result <- grouped_data %>% summarise(Count = n())
      
      if(!subtotalrows) return(grouped_result)
      else {
        result <- grouped_result %>%
          summarise(Count = sum(Count), 
                    !!col_to_change := 'Total') %>%
          bind_rows(grouped_result %>%
                      mutate(!!col_to_change := as.character(.data[[col_to_change]]))) %>%
          select(all_of(groups), Count) %>%
          arrange(across(all_of(groups)))
      }
      return(result)
    }
    

    Test the function -

    test_function(grps = c(cyl, gear))
    
    # A tibble: 11 x 3
    #     cyl gear  Count
    #   <dbl> <chr> <int>
    # 1     4 3         1
    # 2     4 4         8
    # 3     4 5         2
    # 4     4 Total    11
    # 5     6 3         2
    # 6     6 4         4
    # 7     6 5         1
    # 8     6 Total     7
    # 9     8 3        12
    #10     8 5         2
    #11     8 Total    14
    
    test_function(grps = c(cyl, gear), FALSE)
    
    #    cyl  gear Count
    #  <dbl> <dbl> <int>
    #1     4     3     1
    #2     4     4     8
    #3     4     5     2
    #4     6     3     2
    #5     6     4     4
    #6     6     5     1
    #7     8     3    12
    #8     8     5     2
    

    For 3 variables -

    test_function(grps = c(cyl, gear, carb))
    
    #    cyl  gear carb  Count
    #   <dbl> <dbl> <chr> <int>
    # 1     4     3 1         1
    # 2     4     3 Total     1
    # 3     4     4 1         4
    # 4     4     4 2         4
    # 5     4     4 Total     8
    # 6     4     5 2         2
    # 7     4     5 Total     2
    # 8     6     3 1         2
    # 9     6     3 Total     2
    #10     6     4 4         4
    #11     6     4 Total     4
    #12     6     5 6         1
    #13     6     5 Total     1
    #14     8     3 2         4
    #15     8     3 3         3
    #16     8     3 4         5
    #17     8     3 Total    12
    #18     8     5 4         1
    #19     8     5 8         1
    #20     8     5 Total     2