Search code examples
rdataframemergeduplicatesdata-cleaning

R Merge duplicate columns that have different values in one dataframe


I currently have survey data where a set of Likert-type questions appears twice in the dataset and the set of questions a participant answered depends on an initial response to a binary "check" question. My goal is to merge the sets of duplicate questions. The data looks something like this:

Check Q1 Q2 Q3 Q1.1 Q2.1 Q3.1
1 5 5 4
1 2 5 3
2 4 6 3
2 4 2 1

...where Q1.1 is a duplicate of Q1, and so on for Q2 and Q3

And I'd like my final output to look like this:

Check Q1 Q2 Q3
1 5 5 4
1 2 5 3
2 4 6 3
2 4 2 1

I've been testing out a variety of ideas using things like for-loops, sapply, paste, and cbind. I've run into walls with each of them, particularly because I need to somehow match questions (ex. Q1 gets Q1.1's value when check==2) and run this over a set of multiple columns in one dataset.

Any help on this would be greatly appreciated!


Solution

  • If the missing elements are NA, pivot_longer can be used

    library(tidyr)
    pivot_longer(df1, cols = -Check, names_pattern = "^(Q\\d+).*", 
          names_to = ".value", values_drop_na = TRUE)
    

    -output

    # A tibble: 4 × 4
      Check    Q1    Q2    Q3
      <int> <int> <int> <int>
    1     1     5     5     4
    2     1     2     5     3
    3     2     4     6     3
    4     2     4     2     1
    

    data

    df1 <- structure(list(Check = c(1L, 1L, 2L, 2L), Q1 = c(5L, 2L, NA, 
    NA), Q2 = c(5L, 5L, NA, NA), Q3 = c(4L, 3L, NA, NA), Q1.1 = c(NA, 
    NA, 4L, 4L), Q2.1 = c(NA, NA, 6L, 2L), Q3.1 = c(NA, NA, 3L, 1L
    )), class = "data.frame", row.names = c(NA, -4L))