I have a general piece of code A that comes up repeatedly in a series of programs. Each instance of A assumes the form
output_data = input_data %>%
do common operations %>%
do specific dplyr methods that vary from instance to instance %>%
do more common operations
Due to the repeated calls to A, it makes sense to wrap this code in a function. In order to handle the instance-specific dplyr method calls, I want to pass the dplyr methods into the function as arguments. As such, I was wondering how you can pass multiple dplyr methods, each with arbitrary numbers of conditions, into a function in a succinct way.
It is not too hard to pass a single dplyr method into a function with an arbitrary number of arguments i.e
insert_dplyr_method = function(input_data, dplyr_method, ...) {
output_data = input_data %>%
dplyr_method(...)
return(output_data)
}
Test
dframe = data.frame(start = c(1,1,1,2,2,3,3,3,3),
middle = sample(1:9),
end = c(1,2,3,1,2,1,2,3,4))
dframe_1 = insert_dplyr_method(dframe,
dplyr::filter,
start == 1,
end == 2)
dframe_2 = insert_dplyr_method(dframe,
dplyr::select,
all_of("start"))
What I would really like to do is to pass in n dplyr methods, each with an arbitrary number of arguments i.e for the n = 2 case something like
insert_dplyr_method_2 = function(input_data, dplyr_method_1, ...1, dplyr_method_2, ...2) {
output_data = input_data %>%
dplyr_method_1(..._1) %>%
dplyr_method_2(..._2)
return(output_data)
}
The only way I could think of to do this would require passing the dplyr methods and their corresponding ellipsis into the function in a list i.e
dplyr_methods = list(c(dplyr_method_1, ...), c(dplyr_method_2, ...), etc.)
and then using the do.call() method (see here, here and here) though I was unable to get it to work.
I was wondering if anyone could show me how to do this? I'm also open to better approaches if anyone knows of one.
1) Instead of passing the functions and arguments of the varying portions pass a pipeline with the arguments already filled in to the main function, do_all
. Below we use your example except we have added a non-varying sum
at the end to show how that works. Note that . %>% whatever
is magrittr syntax for defining a function which passes the input to whatever.
library(dplyr)
set.seed(123)
do_all <- function(data, fun) data %>% fun %>% sum # main function
dframe = data.frame(start = c(1,1,1,2,2,3,3,3,3),
middle = sample(1:9),
end = c(1,2,3,1,2,1,2,3,4))
fun <- . %>% filter(start == 1, end == 2) %>% select(start)
do_all(dframe, fun)
## [1] 1
fun <- . %>% filter(start == 1, end == 2) %>% select(end)
do_all(dframe, fun)
## [1] 2
2) Alternately define the pre and post processing pipelines and then just run the entire pipeline each time. pre
and post
are the non-varying portions.
pre <- . %>% identity
post <- . %>% sum
dframe %>% pre %>% filter(start == 1, end == 2) %>% select(start) %>% post
## [1] 1
dframe %>% pre %>% filter(start == 1, end == 2) %>% select(end) %>% post
## [1] 2