Search code examples
rloopssumiteration

R : How to iterate sum across multiple columns?


I am trying to iterate/loop a sum across multiple non-consecutive columns. My objective is to compute the subscale score of multiple questionnaires measured repeatedly across time.

Dataset for one questionnaire of x items and n time-points:

df <- tibble(
  ID = 1:5,
  itemA_1 = sample(100, 5, TRUE),
  itemB_1 = sample(100, 5, TRUE),
  itemC_1 = sample(100, 5, TRUE),
  itemD_1 = sample(100, 5, TRUE),
  itemx_1 = sample(100, 5, TRUE),
  itemA_3 = sample(100, 5, TRUE),
  itemB_3 = sample(100, 5, TRUE),
  itemC_3 = sample(100, 5, TRUE),
  itemD_3 = sample(100, 5, TRUE),
  itemx_3 = sample(100, 5, TRUE),
  itemA_n = sample(100, 5, TRUE),
  itemB_n = sample(100, 5, TRUE),
  itemC_n = sample(100, 5, TRUE),
  itemD_n = sample(100, 5, TRUE),
  itemx_n = sample(100, 5, TRUE),
)

The sum for one specific time point works just fine:

df %>% mutate(total_1 = sum(c(itemA_1, itemC_1, itemD_1))

This loop does not work:

for (i in c(1, 3, n)) {
    df %>% mutate(total_i = sum(c(itemA_i, itemC_i, itemD_i))
    }

What am I doing wrong?


Solution

  • We may reshape to 'long' format with pivot_longer and do a group by sum

    library(dplyr)
    library(tidyr)
    df1 <- df %>%
       pivot_longer(cols =-ID, names_to = c("item", ".value"), names_sep = "_") %>% 
      filter(item %in% c("itemA", "itemC", "itemD")) %>%
      group_by(ID) %>%
      summarise(across(where(is.numeric), sum, na.rm = TRUE,
           .names = "total_{.col}")) %>%
      left_join(df, .)
    

    -output

    > df1
    # A tibble: 5 × 19
         ID itemA_1 itemB_1 itemC_1 itemD_1 itemx_1 itemA_3 itemB_3 itemC_3 itemD_3 itemx_3 itemA_n itemB_n itemC_n itemD_n itemx_n total_1
      <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>
    1     1      69      27      56      44      54      53      66      28      67      19      65      38      12      45      33     250
    2     2      31      65       7      34      84      19      64      70      27      23      98      65      94      71     100     221
    3     3      58      34      68      18      69     100      24      47      54      60      47      48      81      61      22     247
    4     4      95      16      85      34       9      28      73      57      79      60      57      31      16      24      84     239
    5     5      19      66      43      25      35      31      39      17      15      84      10      23     100       6      74     188
    # … with 2 more variables: total_3 <int>, total_n <int>
    

    If we want to use the for loop, then paste the column names with i, evaluate (!!) while assigning (:=)

    library(stringr)
    for (i in c(1, 3, 'n')) {
         df <- df %>% 
       mutate(!! str_c("total_", i) :=   
          rowSums(across(all_of(str_c(c("itemA_", "itemC_", "itemD_"), i)))))
         }
    

    But, note that this will not be dynamic as we have to manually include the 1, 2, ..., n in the loop

    -checking the output from for loop and reshaping

    > all.equal(df1$total_1, df$total_1)
    [1] TRUE
    > all.equal(df1$total_3, df$total_3)
    [1] TRUE
    > all.equal(df1$total_n, df$total_n)
    [1] TRUE