Search code examples
rqualtrics

Collapse repeated questions into summative columns in R


I have a dataset created using Qualtrics’ Loop and Merge function, meaning that each respondent responded to 1 of 25 versions of the same 10 questions. The code below produces the resulting data for 3 participants. In total, there’s 25 participants and 250 columns related to these questions, but each person only responded to 10 of them. There’s also an ID number column.

structure(list(id = 1:3, q1..1 = c(2L, NA, NA), q2..1 = c(4L, 
NA, NA), q3..1 = c(3L, NA, NA), q4..1 = c(5L, NA, NA), q5..1 = c(2L, 
NA, NA), q6..1 = c(1L, NA, NA), q7..1 = c(2L, NA, NA), q8..1 = c(4L, 
NA, NA), q9..1 = c(6L, NA, NA), q10..1 = c(3L, NA, NA), q1..2 = c(NA, 
3L, NA), q2..2 = c(NA, 5L, NA), q3..2 = c(NA, 2L, NA), q4..2 = c(NA, 
6L, NA), q5..2 = c(NA, 4L, NA), q6..2 = c(NA, 2L, NA), q7..2 = c(NA, 
3L, NA), q8..2 = c(NA, 4L, NA), q9..2 = c(NA, 2L, NA), q10..2 = c(NA, 
1L, NA), q1..3 = c(NA, NA, 1L), q2..3 = c(NA, NA, 3L), q3..3 = c(NA, 
NA, 4L), q4..3 = c(NA, NA, 2L), q5..3 = c(NA, NA, 4L), q6..3 = c(NA, 
NA, 5L), q7..3 = c(NA, NA, 3L), q8..3 = c(NA, NA, 3L), q9..3 = c(NA, 
NA, 2L), q10..3 = c(NA, NA, 2L)), class = "data.frame", row.names = c(NA, 
-3L))

I’m trying to collapse all of people’s responses into just 10 columns, either by creating 10 new columns and populating them with people’s responses, or just shifting each person’s 10 responses all the way to the left of the dataset and deleting the remaining columns. So I’m trying to get something like what this produces:

structure(list(id = 1:3, q1 = c(2L, 3L, 1L), q2 = c(4L, 5L, 3L
), q3 = c(3L, 2L, 4L), q4 = c(5L, 6L, 2L), q5 = c(2L, 4L, 4L), 
    q6 = c(1L, 2L, 5L), q7 = c(2L, 3L, 3L), q8 = c(4L, 4L, 3L
    ), q9 = c(6L, 2L, 2L), q10 = c(3L, 1L, 2L)), class = "data.frame", row.names = c(NA, 
-3L))

I’m not having any success even getting this off the ground, so I don’t have any useful preliminary work to post. Thanks in advance for any help!


Solution

  • We can use split.default to divide the data based on the common part of the column names. Assuming for every question there is only one answer, we can take row-wise sum ignoring the NA values.

    temp <- cbind(df[1], sapply(split.default(df[-1], 
             sub("\\..*", "", names(df)[-1])), function(x) rowSums(x, na.rm = TRUE)))
    
    temp[gtools::mixedsort(names(temp))]
    #  id q1 q2 q3 q4 q5 q6 q7 q8 q9 q10
    #1  1  2  4  3  5  2  1  2  4  6   3
    #2  2  3  5  2  6  4  2  3  4  2   1
    #3  3  1  3  4  2  4  5  3  3  2   2