Search code examples
rtidyversedplyracross

How to subtract two columns using tidyverse mutate with columns named by external variables


I’d like to dynamically assign which columns to subtract from each other. I’ve read around and looks like I need to use all_of, and maybe across (How to subtract one column from multiple columns in a dataframe in R using dplyr, How to you use objects in dplyr filter?). I can get it working for one variable in a mutate phrase (e.g. mutate(y = all_of(x))), but I can’t seem to do even simple calculations using two. Here’s a simplified example of what I want to do:

var1 <- c("Sepal.Length")
var2 <- c("Sepal.Width")

result <- iris %>%
  mutate(calculation = all_of(var1) - all_of(var2))

Solution

  • We may use .data to subset the column as a vector. The all_of/any_of are used along with across to loop across the columns

    library(dplyr)
    iris %>%
      mutate(calculation = .data[[var1]] - .data[[var2]])%>%
      head
    

    -output

    Sepal.Length Sepal.Width Petal.Length Petal.Width Species calculation
    1          5.1         3.5          1.4         0.2  setosa         1.6
    2          4.9         3.0          1.4         0.2  setosa         1.9
    3          4.7         3.2          1.3         0.2  setosa         1.5
    4          4.6         3.1          1.5         0.2  setosa         1.5
    5          5.0         3.6          1.4         0.2  setosa         1.4
    6          5.4         3.9          1.7         0.4  setosa         1.5
    

    Or may also use cur_data()

    iris %>%
        head %>%
        mutate(calculation = cur_data()[[var1]] - cur_data()[[var2]])
    

    -output

     Sepal.Length Sepal.Width Petal.Length Petal.Width Species calculation
    1          5.1         3.5          1.4         0.2  setosa         1.6
    2          4.9         3.0          1.4         0.2  setosa         1.9
    3          4.7         3.2          1.3         0.2  setosa         1.5
    4          4.6         3.1          1.5         0.2  setosa         1.5
    5          5.0         3.6          1.4         0.2  setosa         1.4
    6          5.4         3.9          1.7         0.4  setosa         1.5
    

    Or another option is to pass both the variables in across, and then reduce with -

    library(purrr)
    iris %>%
       head %>%
       mutate(calculation = reduce(across(all_of(c(var1, var2))), `-`))
    

    -output

    Sepal.Length Sepal.Width Petal.Length Petal.Width Species calculation
    1          5.1         3.5          1.4         0.2  setosa         1.6
    2          4.9         3.0          1.4         0.2  setosa         1.9
    3          4.7         3.2          1.3         0.2  setosa         1.5
    4          4.6         3.1          1.5         0.2  setosa         1.5
    5          5.0         3.6          1.4         0.2  setosa         1.4
    6          5.4         3.9          1.7         0.4  setosa         1.5
    

    Or could convert to symbol and evaluate (!!)

    iris %>% 
       head %>% 
       mutate(calculation = !! rlang::sym(var1) - !! rlang::sym(var2))
      Sepal.Length Sepal.Width Petal.Length Petal.Width Species calculation
    1          5.1         3.5          1.4         0.2  setosa         1.6
    2          4.9         3.0          1.4         0.2  setosa         1.9
    3          4.7         3.2          1.3         0.2  setosa         1.5
    4          4.6         3.1          1.5         0.2  setosa         1.5
    5          5.0         3.6          1.4         0.2  setosa         1.4
    6          5.4         3.9          1.7         0.4  setosa         1.5
    

    Or if we want to use all_of in across, just subset the column with [[

    iris %>% 
       head %>% 
       mutate(calculation = across(all_of(var1))[[1]] - 
                             across(all_of(var2))[[1]])
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species calculation
    1          5.1         3.5          1.4         0.2  setosa         1.6
    2          4.9         3.0          1.4         0.2  setosa         1.9
    3          4.7         3.2          1.3         0.2  setosa         1.5
    4          4.6         3.1          1.5         0.2  setosa         1.5
    5          5.0         3.6          1.4         0.2  setosa         1.4
    6          5.4         3.9          1.7         0.4  setosa         1.5
    

    The reason we need to subset is because, across by default will update the original column when the .names is not present. The calculation will be a data.frame with a single column

    out <- iris %>%
        head %>% 
        mutate(calculation = across(all_of(var1)) -
               across(all_of(var2)))
    out
      Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length
    1          5.1         3.5          1.4         0.2  setosa          1.6
    2          4.9         3.0          1.4         0.2  setosa          1.9
    3          4.7         3.2          1.3         0.2  setosa          1.5
    4          4.6         3.1          1.5         0.2  setosa          1.5
    5          5.0         3.6          1.4         0.2  setosa          1.4
    6          5.4         3.9          1.7         0.4  setosa          1.5
    
    str(out)
    data.frame':    6 obs. of  6 variables:
     $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4
     $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9
     $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7
     $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4
     $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1
     $ calculation :'data.frame':   6 obs. of  1 variable:
      ..$ Sepal.Length: num  1.6 1.9 1.5 1.5 1.4 1.5