I am trying to iterate/loop a sum across multiple non-consecutive columns. My objective is to compute the subscale score of multiple questionnaires measured repeatedly across time.
Dataset for one questionnaire of x items and n time-points:
df <- tibble(
ID = 1:5,
itemA_1 = sample(100, 5, TRUE),
itemB_1 = sample(100, 5, TRUE),
itemC_1 = sample(100, 5, TRUE),
itemD_1 = sample(100, 5, TRUE),
itemx_1 = sample(100, 5, TRUE),
itemA_3 = sample(100, 5, TRUE),
itemB_3 = sample(100, 5, TRUE),
itemC_3 = sample(100, 5, TRUE),
itemD_3 = sample(100, 5, TRUE),
itemx_3 = sample(100, 5, TRUE),
itemA_n = sample(100, 5, TRUE),
itemB_n = sample(100, 5, TRUE),
itemC_n = sample(100, 5, TRUE),
itemD_n = sample(100, 5, TRUE),
itemx_n = sample(100, 5, TRUE),
)
The sum for one specific time point works just fine:
df %>% mutate(total_1 = sum(c(itemA_1, itemC_1, itemD_1))
This loop does not work:
for (i in c(1, 3, n)) {
df %>% mutate(total_i = sum(c(itemA_i, itemC_i, itemD_i))
}
What am I doing wrong?
We may reshape to 'long' format with pivot_longer
and do a group by sum
library(dplyr)
library(tidyr)
df1 <- df %>%
pivot_longer(cols =-ID, names_to = c("item", ".value"), names_sep = "_") %>%
filter(item %in% c("itemA", "itemC", "itemD")) %>%
group_by(ID) %>%
summarise(across(where(is.numeric), sum, na.rm = TRUE,
.names = "total_{.col}")) %>%
left_join(df, .)
-output
> df1
# A tibble: 5 × 19
ID itemA_1 itemB_1 itemC_1 itemD_1 itemx_1 itemA_3 itemB_3 itemC_3 itemD_3 itemx_3 itemA_n itemB_n itemC_n itemD_n itemx_n total_1
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 69 27 56 44 54 53 66 28 67 19 65 38 12 45 33 250
2 2 31 65 7 34 84 19 64 70 27 23 98 65 94 71 100 221
3 3 58 34 68 18 69 100 24 47 54 60 47 48 81 61 22 247
4 4 95 16 85 34 9 28 73 57 79 60 57 31 16 24 84 239
5 5 19 66 43 25 35 31 39 17 15 84 10 23 100 6 74 188
# … with 2 more variables: total_3 <int>, total_n <int>
If we want to use the for
loop, then paste
the column names with i
, evaluate (!!
) while assigning (:=
)
library(stringr)
for (i in c(1, 3, 'n')) {
df <- df %>%
mutate(!! str_c("total_", i) :=
rowSums(across(all_of(str_c(c("itemA_", "itemC_", "itemD_"), i)))))
}
But, note that this will not be dynamic as we have to manually include the 1, 2, ..., n
in the loop
-checking the output from for
loop and reshaping
> all.equal(df1$total_1, df$total_1)
[1] TRUE
> all.equal(df1$total_3, df$total_3)
[1] TRUE
> all.equal(df1$total_n, df$total_n)
[1] TRUE