Search code examples

Using dplyr within a function, non-standard evaluation

Trying to get my head around Non-Standard Evaluation as used by dplyr but without success. I'd like a short function that returns summary statistics (N, mean, sd, median, IQR, min, max) for a specified set of variables.

Simplified version of my function...

my_summarise <- function(df = temp,
                         to.sum = 'eg1',
    ## Summarise
    results <- summarise_(df,
                          n = ~n(),
                          mean = mean(~to.sum, na.rm = TRUE))

And running it with some dummy data...

temp <- cbind(rnorm(n = 100, mean = 2, sd = 4),
              rnorm(n = 100, mean = 3, sd = 6)) %>%
names(temp) <- c('eg1', 'eg2')
  [1] 1.881721
  [1] 3.575819
my_summarise(df = temp, to.sum = 'eg1')
    n mean
1 100   NA

N is calculated, but the mean is not, can't figure out why.

Ultimately I'd like my function to be more general, along the lines of...

my_summarise <- function(df = temp,
                = 'group'
                         to.sum = c('eg1', 'eg2'),
    results <- list()
    ## Select columns
    df <- dplyr::select_(df, .dots = c(, to.sum))
    ## Summarise overall
    results$all <- summarise_each(df,
                                  funs(n = ~n(),
                                       mean = mean(~to.sum, na.rm = TRUE)))
    ## Summarise by specified group
    results$ <- group_by_(df, %>%
                                       funs(n = ~n(),
                                       mean = mean(~to.sum, na.rm = TRUE)))        

...but before I move onto this more complex version (which I was using this example for guidance) I need to get the evaluation working in the simple version first as thats the stumbling block, the call to dplyr::select() works ok.

Appreciate any advice as to where I'm going wrong.

Thanks in advance


  • The basic idea is that you have to actually build the appropriate call yourself, most easily done with the lazyeval package.

    In this case you want to programmatically create a call that looks like ~mean(eg1, na.rm = TRUE). This is how:

    my_summarise <- function(df = temp,
                             to.sum = 'eg1',
      ## Summarise
      results <- summarise_(df,
                            n = ~n(),
                            mean = lazyeval::interp(~mean(x, na.rm = TRUE),
                                                    x =

    Here is what I do when I struggle to get things working:

    1. Remember that, just like the ~n() you already have, the call will have to start with a ~.
    2. Write the correct call with the actual variable and see if it works (~mean(eg1, na.rm = TRUE)).
    3. Use lazyeval::interp to recreate that call, and check this by running only the interp to visually see what it is doing.

    In this case I would probably often write interp(~mean(x, na.rm = TRUE), x = to.sum). But running that will give us ~mean("eg1", na.rm = TRUE) which is treating eg1 as a character instead of a variable name. So we use, as is taught to us in vignette("nse").