I have a large dataset with the following rough structure, where several columns share prefixes:
dataset <- data.frame(a1 = c(1:10), a2 = c(11:20),b1=c(21:30),b2=c(31:40))
I want to create new columns with the row sum across each group of columns which share prefixes , so that it ends up looking like this
a1 a2 b1 b2 a_sum b_sum
1 1 11 21 31 12 52
2 2 12 22 32 14 54
3 3 13 23 33 16 56
4 4 14 24 34 18 58
5 5 15 25 35 20 60
6 6 16 26 36 22 62
7 7 17 27 37 24 64
8 8 18 28 38 26 66
9 9 19 29 39 28 68
10 10 20 30 40 30 70
My preferred method would be somehow feed the character vector of prefixes vec <- c("a","b")
into a mutate function which fits in a larger pipe workflow.
I have been wondering of ways to use across
with matches
, perhaps with some sort of lapply
call to perform it for each element of my vector, but I fall short. I would prefer to avoid solutions which involve reshaping the data, as it is quite large and there are many columns which don't relate to this specific issue.
We can use map
or lapply
over the vec
vector to create the desired one-column tibbles, and finally bind_cols()
to bind them at the end of the original data.frame.
To select columns matching a character prefix, we should usually use the starts_with
selection helper.
The walrus (:=
) operator allows us to dynamically generate column names as desired inside tibble
.
library(dplyr)
library(purrr)
vec <- c("a","b")
dataset |>
bind_cols(vec |>
map(\(x) tibble("{x}_sum" := rowSums(
select(dataset, starts_with(x))
))))
a1 a2 b1 b2 a_sum b_sum
1 1 11 21 31 12 52
2 2 12 22 32 14 54
3 3 13 23 33 16 56
4 4 14 24 34 18 58
5 5 15 25 35 20 60
6 6 16 26 36 22 62
7 7 17 27 37 24 64
8 8 18 28 38 26 66
9 9 19 29 39 28 68
10 10 20 30 40 30 70
Edit
To enable upstream transformations to feed into this operation seamlessly, we can adapt the code to utilize the current data via pick
rather than relying on external provision through select
dataset |>
mutate(bind_cols(vec |>
map(\(x) tibble("{x}_sum" := rowSums(
pick(starts_with(x))
)))))