Search code examples
rdplyrtidyeval

Passing two sets of column names to function


I am trying to pass two sets of column names to function and do something with them using dplyr. Normally for one set I would use the ellipsis (...) and convert it to quosures with enquos(). But now I have two sets of column names so I thought about using lists to store them. How should I make this work in the most efficient way? (Answers using purrr, rlang and any other packages' functions are most welcome)

Packages and example of data

library(dplyr) #I use whole library(tidyverse) but this is sufficient for this case

some.data <- tibble(col1 = sample(letters[1:3], 500, replace = T),
                    col2 = sample(letters[1:3], 500, replace = T),
                    col3 = sample(letters[4:6], 500, replace = T),
                    col4 = sample(letters[4:6], 500, replace = T))

My function (put simple) looks like this:

cross_table <- function(data = NULL, list1 = NULL, list2 = NULL){

   for(l1 in list1){
      for(l2 in list2){

         data.out <- data %>% 
            count(l1, l2) %>% 
            spread(l2, n, fill = 0, drop = FALSE)

         print(data.out) #Just to show it works. I want to use 'data.out' object later on

      }
   }
}

and I want to use function like this (not giving columns' names as strings)

some.data %>%
   cross_table(list1 = list(col1, col2), list2 = list(col3, col4))

Solution

  • The vars() function could be a good fit here. You'd use it in place of list() in your function arguments. I saw an example in this SO answer and it extends to your situation pretty handily.

    That plus some tidyeval within your loop would look like:

    cross_table <- function(data = NULL, list1 = NULL, list2 = NULL){
    
            for(l1 in list1){
                for(l2 in list2){
    
                    l1 = enquo(l1)
                    l2 = enquo(l2)
                    data.out <- data %>%
                        count(!!l1, !!l2) %>%
                        spread(!!l2, n, fill = 0, drop = FALSE) 
    
                    print(data.out) 
                }
            }
        }
    
    some.data %>%
       cross_table(list1 = vars(col1, col2), list2 = vars(col3, col4))
    
    
    # A tibble: 3 x 4
      col1      d     e     f
      <chr> <dbl> <dbl> <dbl>
    1 a        58    61    53
    2 b        38    59    47
    3 c        65    59    60
    # A tibble: 3 x 4
      col1      d     e     f
      <chr> <dbl> <dbl> <dbl>
    1 a        53    61    58
    2 b        44    47    53
    3 c        56    62    66
    # A tibble: 3 x 4
      col2      d     e     f
      <chr> <dbl> <dbl> <dbl>
    1 a        55    60    51
    2 b        57    67    56
    3 c        49    52    53
    # A tibble: 3 x 4
      col2      d     e     f
      <chr> <dbl> <dbl> <dbl>
    1 a        51    56    59
    2 b        63    55    62
    3 c        39    59    56
    

    You can also use alist() in place of list() (which it looks like I learned at one point but have since forgotten :-D).