Search code examples
rdplyrrlang

In a named argument to dplyr::funs, can I reference the names of other arguments?


Consider the following:

library(tidyverse)

df <- tibble(x = rnorm(100), y = rnorm(100, 10, 2), z = x * y)

df %>% 
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - mean(.)) / sd(.)))

Is there a way to avoid calling mean and sd twice by referencing the avg and dev columns. What I have in mind is something like

df %>% 
mutate_all(funs(avg = mean(.), dev = sd(.), scaled = (. - avg) / dev))

Clearly this won't work because there aren't columns avg and dev, but x_avg, x_dev, y_avg, y_dev, etc.

Is there a good way, within funs to use the rlang tools to create those column references programmatically, so that I can refer to columns created by the previous named arguments to funs (when . is x, I would reference x_mean and x_dev for calculating x_scaled, and so forth)?


Solution

  • This seems a little convoluted, but it works:

    scaled <- function(col_name, x, y) {
      col_name <- deparse(substitute(col_name))
      avg <- eval.parent(as.symbol(paste0(col_name, x)))
      dev <- eval.parent(as.symbol(paste0(col_name, y)))
      (eval.parent(as.symbol(col_name)) - avg) / dev
    }
    
    df %>%
      mutate_all(funs(avg = mean(.), dev = sd(.), scaled = scaled(., "_avg", "_dev")))