I currently have survey data where a set of Likert-type questions appears twice in the dataset and the set of questions a participant answered depends on an initial response to a binary "check" question. My goal is to merge the sets of duplicate questions. The data looks something like this:
Check | Q1 | Q2 | Q3 | Q1.1 | Q2.1 | Q3.1 |
---|---|---|---|---|---|---|
1 | 5 | 5 | 4 | |||
1 | 2 | 5 | 3 | |||
2 | 4 | 6 | 3 | |||
2 | 4 | 2 | 1 |
...where Q1.1 is a duplicate of Q1, and so on for Q2 and Q3
And I'd like my final output to look like this:
Check | Q1 | Q2 | Q3 |
---|---|---|---|
1 | 5 | 5 | 4 |
1 | 2 | 5 | 3 |
2 | 4 | 6 | 3 |
2 | 4 | 2 | 1 |
I've been testing out a variety of ideas using things like for-loops, sapply, paste, and cbind. I've run into walls with each of them, particularly because I need to somehow match questions (ex. Q1 gets Q1.1's value when check==2) and run this over a set of multiple columns in one dataset.
Any help on this would be greatly appreciated!
If the missing elements are NA
, pivot_longer
can be used
library(tidyr)
pivot_longer(df1, cols = -Check, names_pattern = "^(Q\\d+).*",
names_to = ".value", values_drop_na = TRUE)
-output
# A tibble: 4 × 4
Check Q1 Q2 Q3
<int> <int> <int> <int>
1 1 5 5 4
2 1 2 5 3
3 2 4 6 3
4 2 4 2 1
df1 <- structure(list(Check = c(1L, 1L, 2L, 2L), Q1 = c(5L, 2L, NA,
NA), Q2 = c(5L, 5L, NA, NA), Q3 = c(4L, 3L, NA, NA), Q1.1 = c(NA,
NA, 4L, 4L), Q2.1 = c(NA, NA, 6L, 2L), Q3.1 = c(NA, NA, 3L, 1L
)), class = "data.frame", row.names = c(NA, -4L))