Search code examples
rdplyrdry

How to avoid repeating code in dplyr::mutate() call with multiple arguments?


Problem

I am transitioning to dplyr from base R.

I would like to shorten the following code to respect the DRY (Don't Repeat Yourself) principle:

mtcars %>% mutate(w = rowMeans(select(., mpg:disp), na.rm = TRUE),
                  x = rowMeans(select(., hp:wt), na.rm = TRUE),
                  y = rowMeans(select(., qsec:am), na.rm = TRUE),
                  z = rowMeans(select(., gear:carb), na.rm = TRUE))

or

mtcars %>% rowwise() %>% mutate(w = mean(mpg:disp, na.rm = TRUE),
                                x = mean(hp:wt, na.rm = TRUE),
                                y = mean(qsec:am, na.rm = TRUE),
                                z = mean(gear:carb, na.rm = TRUE))
# Note: this one produced an error with my own data

Goal

The goal is to compute the means of different scales in a data frame from a single call. As you can see, the rowMeans, select, and na.rm arguments repeat several times (imagine I have several more variables than for this example).

Attempts

I was trying to come up with an across() solution,

mtcars %>% mutate(across(mpg:carb, mean, .names = "mean_{col}"))

But it doesn't produce the correct outcome because I don't see how to specify different column arguments for w:z. Using the c_across from the documentation example and we are back to repeating code:

mtcars %>% rowwise() %>% mutate(w = mean(c_across(mpg:disp), na.rm = TRUE),
                                x = mean(c_across(hp:wt), na.rm = TRUE),
                                y = mean(c_across(qsec:am), na.rm = TRUE),
                                z = mean(c_across(gear:carb), na.rm = TRUE))

I am tempted to resort to lapply or a custom function but I feel like it would be defeating the purpose of adapting to dplyr and the new across() argument.

Edit: To clarify, I want to avoid calling rowMeans, select, and na.rm more than once.

Related threads: 1, 2, 3.


Solution

  • New slightly shorter solution as of dplyr 1.1.0 using the new pick() function:

    library(dplyr)
    
    mtcars %>% mutate(w = rowMeans(pick(mpg:disp), na.rm = TRUE),
                      x = rowMeans(pick(hp:wt), na.rm = TRUE),
                      y = rowMeans(pick(qsec:am), na.rm = TRUE),
                      z = rowMeans(pick(gear:carb), na.rm = TRUE)) %>% 
      head()
    #>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb         w
    #> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4  62.33333
    #> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4  62.33333
    #> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1  44.93333
    #> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1  95.13333
    #> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 128.90000
    #> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1  83.03333
    #>                          x        y   z
    #> Mazda RX4         38.84000 5.820000 4.0
    #> Mazda RX4 Wag     38.92500 6.006667 4.0
    #> Datsun 710        33.05667 6.870000 2.5
    #> Hornet 4 Drive    38.76500 6.813333 2.0
    #> Hornet Sportabout 60.53000 5.673333 2.5
    #> Valiant           37.07333 7.073333 2.0
    

    Explanation: the new pick() function now allows us to avoid specifying the dot argument as in select().

    Created on 2023-05-19 with reprex v2.0.2