I am transitioning to dplyr
from base R
.
I would like to shorten the following code to respect the DRY (Don't Repeat Yourself) principle:
mtcars %>% mutate(w = rowMeans(select(., mpg:disp), na.rm = TRUE),
x = rowMeans(select(., hp:wt), na.rm = TRUE),
y = rowMeans(select(., qsec:am), na.rm = TRUE),
z = rowMeans(select(., gear:carb), na.rm = TRUE))
or
mtcars %>% rowwise() %>% mutate(w = mean(mpg:disp, na.rm = TRUE),
x = mean(hp:wt, na.rm = TRUE),
y = mean(qsec:am, na.rm = TRUE),
z = mean(gear:carb, na.rm = TRUE))
# Note: this one produced an error with my own data
The goal is to compute the means of different scales in a data frame from a single call. As you can see, the rowMeans
, select
, and na.rm
arguments repeat several times (imagine I have several more variables than for this example).
I was trying to come up with an across()
solution,
mtcars %>% mutate(across(mpg:carb, mean, .names = "mean_{col}"))
But it doesn't produce the correct outcome because I don't see how to specify different column arguments for w:z
. Using the c_across
from the documentation example and we are back to repeating code:
mtcars %>% rowwise() %>% mutate(w = mean(c_across(mpg:disp), na.rm = TRUE),
x = mean(c_across(hp:wt), na.rm = TRUE),
y = mean(c_across(qsec:am), na.rm = TRUE),
z = mean(c_across(gear:carb), na.rm = TRUE))
I am tempted to resort to lapply
or a custom function but I feel like it would be defeating the purpose of adapting to dplyr
and the new across()
argument.
Edit: To clarify, I want to avoid calling rowMeans
, select
, and na.rm
more than once.
New slightly shorter solution as of dplyr 1.1.0
using the new pick()
function:
library(dplyr)
mtcars %>% mutate(w = rowMeans(pick(mpg:disp), na.rm = TRUE),
x = rowMeans(pick(hp:wt), na.rm = TRUE),
y = rowMeans(pick(qsec:am), na.rm = TRUE),
z = rowMeans(pick(gear:carb), na.rm = TRUE)) %>%
head()
#> mpg cyl disp hp drat wt qsec vs am gear carb w
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 62.33333
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 62.33333
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 44.93333
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 95.13333
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 128.90000
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 83.03333
#> x y z
#> Mazda RX4 38.84000 5.820000 4.0
#> Mazda RX4 Wag 38.92500 6.006667 4.0
#> Datsun 710 33.05667 6.870000 2.5
#> Hornet 4 Drive 38.76500 6.813333 2.0
#> Hornet Sportabout 60.53000 5.673333 2.5
#> Valiant 37.07333 7.073333 2.0
Explanation: the new pick()
function now allows us to avoid specifying the dot argument as in select()
.
Created on 2023-05-19 with reprex v2.0.2