How to avoid repeating code in dplyr::mutate() call with multiple arguments?


I am transitioning to dplyr from base R.

I would like to shorten the following code to respect the DRY (Don't Repeat Yourself) principle:

mtcars %>% mutate(w = rowMeans(select(., mpg:disp), na.rm = TRUE),
                  x = rowMeans(select(., hp:wt), na.rm = TRUE),
                  y = rowMeans(select(., qsec:am), na.rm = TRUE),
                  z = rowMeans(select(., gear:carb), na.rm = TRUE))


mtcars %>% rowwise() %>% mutate(w = mean(mpg:disp, na.rm = TRUE),
                                x = mean(hp:wt, na.rm = TRUE),
                                y = mean(qsec:am, na.rm = TRUE),
                                z = mean(gear:carb, na.rm = TRUE))
# Note: this one produced an error with my own data


The goal is to compute the means of different scales in a data frame from a single call. As you can see, the rowMeans, select, and na.rm arguments repeat several times (imagine I have several more variables than for this example).


I was trying to come up with an across() solution,

mtcars %>% mutate(across(mpg:carb, mean, .names = "mean_{col}"))

But it doesn't produce the correct outcome because I don't see how to specify different column arguments for w:z. Using the c_across from the documentation example and we are back to repeating code:

mtcars %>% rowwise() %>% mutate(w = mean(c_across(mpg:disp), na.rm = TRUE),
                                x = mean(c_across(hp:wt), na.rm = TRUE),
                                y = mean(c_across(qsec:am), na.rm = TRUE),
                                z = mean(c_across(gear:carb), na.rm = TRUE))

I am tempted to resort to lapply or a custom function but I feel like it would be defeating the purpose of adapting to dplyr and the new across() argument.

Edit: To clarify, I want to avoid calling rowMeans, select, and na.rm more than once.

  • New slightly shorter solution as of dplyr 1.1.0 using the new pick() function:

    mtcars %>% mutate(w = rowMeans(pick(mpg:disp), na.rm = TRUE),
                      x = rowMeans(pick(hp:wt), na.rm = TRUE),
                      y = rowMeans(pick(qsec:am), na.rm = TRUE),
                      z = rowMeans(pick(gear:carb), na.rm = TRUE)) %>% 
    #>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb         w
    #> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4  62.33333
    #> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4  62.33333
    #> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1  44.93333
    #> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1  95.13333
    #> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 128.90000
    #> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1  83.03333
    #>                          x        y   z
    #> Mazda RX4         38.84000 5.820000 4.0
    #> Mazda RX4 Wag     38.92500 6.006667 4.0
    #> Datsun 710        33.05667 6.870000 2.5
    #> Hornet 4 Drive    38.76500 6.813333 2.0
    #> Hornet Sportabout 60.53000 5.673333 2.5
    #> Valiant           37.07333 7.073333 2.0

    Explanation: the new pick() function now allows us to avoid specifying the dot argument as in select().

