Consider the following:
library(tidyverse)
df <- tibble(x = rnorm(100), y = rnorm(100, 10, 2), z = x * y)
df %>%
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - mean(.)) / sd(.)))
Is there a way to avoid calling mean
and sd
twice by referencing the avg
and dev
columns. What I have in mind is something like
df %>%
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - avg) / dev))
Clearly this won't work because there aren't columns avg
and dev
, but x_avg
, x_dev
, y_avg
, y_dev
, etc.
Is there a good way, within funs
to use the rlang
tools to create those column references programmatically, so that I can refer to columns created by the previous named arguments to funs
(when .
is x
, I would reference x_mean
and x_dev
for calculating x_scaled
, and so forth)?
This seems a little convoluted, but it works:
scaled <- function(col_name, x, y) {
col_name <- deparse(substitute(col_name))
avg <- eval.parent(as.symbol(paste0(col_name, x)))
dev <- eval.parent(as.symbol(paste0(col_name, y)))
(eval.parent(as.symbol(col_name)) - avg) / dev
}
df %>%
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = scaled(., "_avg", "_dev")))