Search code examples
rdplyrnse

R function with expression as parameter for dplyr summarise


Okay, this is something that feels like it should be relatively easy, but although I have tried literally dozens of approaches with quote, eval, substitute, enquote, parse, summarize_ etc... I haven't gotten it to work. Basically I am trying to calculate something like this - but with a variable expression for the summarise argument:

mtcars %>% group_by(cyl) %>% summarise(wt=mean(wt),hp=mean(hp))

yielding:

# A tibble: 3 × 3
    cyl       wt        hp   
    <dbl>    <dbl>     <dbl> 
1     4 2.285727  82.63636 
2     6 3.117143 122.28571 
3     8 3.999214 209.21429

One of the things I tried was:

  x2 <- "wt=mean(wt),hp=mean(hp)"
  mtcars %>% group_by(cyl) %>% summarise(eval(parse(text=x2)))

yielding:

Error in eval(substitute(expr), envir, enclos) : 
  <text>:1:12: unexpected ','
1: wt=mean(wt),

But leaving away the second argument (",hp=mean(hp") gets you no further:

> x2 <- "wt=mean(wt)"
> mtcars %>% group_by(cyl) %>% summarise(eval(parse(text=x2)))
Error in eval(substitute(expr), envir, enclos) : object 'wt' not found

I will spare you all the other things I tried - I am clearly missing something about how expressions get handled in function arguments.

So what is the proper approach here? Keeping in mind I really want something like this in the end:

getdf <- function(df,sumarg){
  df %>% group_by(cyl) %>% summarise(sumarg)
  df
}

Also not sure what kind of tag I should use for this kind of query in the R world. Metaprogramming?


Solution

  • For maximum flexibility I would use a ... argument, capture those dots use lazyeval, and then pass to summarise_:

    getdf <- function(df, ...){ 
        df %>% group_by(cyl) %>% summarise_(.dots = lazyeval::lazy_dots(...)) 
    }
    

    Then you can directly do:

    getdf(mtcars, wt = mean(wt), hp = mean(hp))
    
    # A tibble: 3 × 3
        cyl       wt        hp
      <dbl>    <dbl>     <dbl>
    1     4 2.285727  82.63636
    2     6 3.117143 122.28571
    3     8 3.999214 209.21429
    

    One way to do it without ..., is to pass arguments in a list, although you will need to use formulas or quoting. E.g.:

    getdf2 <- function(df, args){ 
        dots <- lazyeval::as.lazy_dots(args)
        df %>% group_by(cyl) %>% summarise_(.dots = dots) 
    }
    

    And use as:

    getdf(mtcars, list(wt = ~mean(wt), hp = ~mean(hp)))
    

    or

    getdf(mtcars, list(wt = "mean(wt)", hp = "mean(hp)"))