Search code examples
rrlangdrake-r-package

Generate workflow plan for all combinations of inputs in Drake?


I'm trying to create a workflow plan that will run some function my_function(x, y) for all combination of inputs in my_dataset but am stuck as to how to to generate the commands for drake's workflow without using paste.

Consider:

library(drake)
library(dplyr)

A <- 'apple'
B <- 'banana'
C <- 'carrot'

my_function <- function(x, y)
    paste(x, y, sep='|IT WORKS|')

my_function(A, B)

combos <- combn(c('A', 'B', 'C'), 2) %>% 
    t() %>% 
    as_data_frame()

targets <- apply(combos, 1, paste, collapse = '_')

commands <- paste0('my_function(', apply(combos, 1, paste, collapse = ', '), ')') 

my_plan <- data_frame(target = targets, command = commands)
make(my_plan)

Output:

> my_plan
# A tibble: 3 x 2
  target command          
  <chr>  <chr>            
1 A_B    my_function(A, B)
2 A_C    my_function(A, C)
3 B_C    my_function(B, C)

The above code works, but I am using paste0 to generate the function call. I don't think this is optimal and it scales poorly. Is there a better way to generate these plans? This may be less of a drake question and more of an rlang question.


Solution

  • DISCLAIMER: This answer shows how to compose expressions using rlang framework. However, drake expects commands as character strings, so the final expressions would need to be converted to strings.

    We begin by capturing A, B and C as symbols using quote, then computing all possible pairwise combinations using the code you already have:

    CB <- combn( list(quote(A), quote(B), quote(C)), 2 ) %>% 
        t() %>% as_data_frame()
    # # A tibble: 3 x 2
    #   V1       V2      
    #   <list>   <list>  
    # 1 <symbol> <symbol>
    # 2 <symbol> <symbol>
    # 3 <symbol> <symbol>
    

    We can now use purrr::map2 to jointly traverse the two columns in parallel and compose our expressions:

    CMDs <- purrr::map2( CB$V1, CB$V2, ~rlang::expr( my_function((!!.x), (!!.y)) ) )
    # [[1]]
    # my_function(A, B)
    
    # [[2]]
    # my_function(A, C)
    
    # [[3]]
    # my_function(B, C)
    

    As mentioned above, drake expects character strings, so we have to convert our expressions to those:

    commands <- purrr::map_chr( CMDs, rlang::quo_name )
    # [1] "my_function(A, B)" "my_function(A, C)" "my_function(B, C)"
    

    The rest of your code should work as before.

    Ultimately, it's up to you to decide whether expression arithmetic or string arithmetic is more efficient / readable for your application. One additional thing to mention is the stringr package, which might make string arithmetic more pleasant to do.