Search code examples
rdplyrquosure

Passing an expression into `MoreArgs` of `mapply`


I'm doing some programming using dplyr, and am curious about how to pass an expression as (specifically a MoreArgs) argument to mapply?

Consider a simple function F that subsets a data.frame based on some ids and a time_range, then outputs a summary statistic based on some other column x.

require(dplyr)
F <- function(ids, time_range, df, date_column, x) {
    date_column <- enquo(date_column)
    x <- enquo(x)
    df %>%
        filter(person_id %chin% ids) %>%
        filter(time_range[1] <= (!!date_column) & (!!date_column) <= time_range[2]) %>%
        summarise(newvar = sum(!!x))
}

We can make up some example data to which we can apply our function F.

person_ids <- lapply(1:2, function(i) sample(letters, size = 10))
time_ranges <- lapply(list(c("2014-01-01", "2014-12-31"),
                           c("2015-01-01", "2015-12-31")), as.Date)

require(data.table)
dt <- CJ(person_id = letters,
         date_col  = seq.Date(from = as.Date('2014-01-01'), to = as.Date('2015-12-31'), by = '1 day'))
dt[, z := rnorm(nrow(dt))]  # The variable we will later sum over, i.e. apply F to.

We can successfully apply our function to each of our inputs.

F(person_ids[[1]], time_ranges[[1]], dt, date_col, z)
F(person_ids[[2]], time_ranges[[2]], dt, date_col, z)

And so if I wanted, I could write a simple for-loop to solve my problem. But if we try to apply syntactic sugar and wrap everything within mapply, we get an error.

mapply(F, ids = person_ids, time_range = time_ranges, MoreArgs = list(df = dt, date_column = date_col, x = z))

# Error in mapply... object 'date_col' not found

Solution

  • In mapply, MoreArgs is provided as a list, but R tries to evaluate the list elements, causing the error. As suggested by @Gregor, you can quote those MoreArgs that we don't want to evaluate immediately, preventing the error and allowing the function to proceed. This can be done with base quote or dplyr quo:

    mapply(F, person_ids, time_ranges, MoreArgs = list(dt, quote(date_col), quote(z)))
    
    mapply(F, person_ids, time_ranges, MoreArgs = list(dt, quo(date_col), quo(z)))
    

    Another option is to use map2 from the purrr package, which is the tidyverse equivalent of mapply with two input vectors. tidyverse functions are set up to work with non-standard evaluation, which avoids the error you're getting with mapply without the need for quoting the arguments:

    library(purrr)
    
    map2(person_ids, time_ranges, F, dt, date_col, z)
    
    [[1]]
        newvar
    1 40.23419
    
    [[2]]
        newvar
    1 71.42327
    

    More generally, you could use pmap, which iterates in parallel over any number of input vectors:

    pmap(list(person_ids, time_ranges), F, dt, date_col, z)