Trying to get my head around Non-Standard Evaluation as used by dplyr but without success. I'd like a short function that returns summary statistics (N, mean, sd, median, IQR, min, max) for a specified set of variables.
Simplified version of my function...
my_summarise <- function(df = temp,
to.sum = 'eg1',
## Summarise
results <- summarise_(df,
n = ~n(),
mean = mean(~to.sum, na.rm = TRUE))
And running it with some dummy data...
temp <- cbind(rnorm(n = 100, mean = 2, sd = 4),
rnorm(n = 100, mean = 3, sd = 6)) %>%
names(temp) <- c('eg1', 'eg2')
[1] 1.881721
[1] 3.575819
my_summarise(df = temp, to.sum = 'eg1')
n mean
1 100 NA
N is calculated, but the mean is not, can't figure out why.
Ultimately I'd like my function to be more general, along the lines of...
my_summarise <- function(df = temp, = 'group'
to.sum = c('eg1', 'eg2'),
results <- list()
## Select columns
df <- dplyr::select_(df, .dots = c(, to.sum))
## Summarise overall
results$all <- summarise_each(df,
funs(n = ~n(),
mean = mean(~to.sum, na.rm = TRUE)))
## Summarise by specified group
results$ <- group_by_(df, %>%
funs(n = ~n(),
mean = mean(~to.sum, na.rm = TRUE)))
...but before I move onto this more complex version (which I was using this example for guidance) I need to get the evaluation working in the simple version first as thats the stumbling block, the call to dplyr::select()
works ok.
Appreciate any advice as to where I'm going wrong.
Thanks in advance
The basic idea is that you have to actually build the appropriate call yourself, most easily done with the lazyeval
In this case you want to programmatically create a call that looks like ~mean(eg1, na.rm = TRUE)
. This is how:
my_summarise <- function(df = temp,
to.sum = 'eg1',
## Summarise
results <- summarise_(df,
n = ~n(),
mean = lazyeval::interp(~mean(x, na.rm = TRUE),
x =
Here is what I do when I struggle to get things working:
you already have, the call will have to start with a ~
.~mean(eg1, na.rm = TRUE)
to recreate that call, and check this by running only the interp
to visually see what it is doing.In this case I would probably often write interp(~mean(x, na.rm = TRUE), x = to.sum)
. But running that will give us ~mean("eg1", na.rm = TRUE)
which is treating eg1
as a character instead of a variable name. So we use
, as is taught to us in vignette("nse")