Search code examples
rdataframerowsanalysis

How can I move subsets of rows to new rows in R?


I created a survey where respondents are asked questions about a number of other people. The resulting dataset has the following variables: respondent ID
Number of people you have information on (1-4)
Person 1-Variable 1
Person 1-Variable 2
Person 2-Variable 1
Person 2-Variable 2
etc...

If the person only had information on one person, all Person 2-3-4-variables are NA.

Do analyze this properly I need to restructure the data so that each line only has information on one person. So if a respondent gave information on two people, the variables for these two people should be on a new line. The result should be something like:

Respondent ID
Variable 1
Variable 2
etc...

I tried a couple of things messing around with moving non-missing values around. But my knowledge of R is too limited to think of a proper solution.


Solution

  • What you are looking for is to convert your data from a so-called "wide" format to a "long(er)" format. There is a handy function for this: pivot_longer() (see here for more details).

    library(tidyr)
    
    X_wide <- data.frame(id = 1:3, P1 = 4:6, P2 = 7:9, P3 = 10:12)
    X_long <- pivot_longer(X_wide, cols = P1:P3, names_to = "person", values_to = "score")
    X_long <- as.data.frame(X_long)
    

    This is the result:

    > X_wide
      id P1 P2 P3
    1  1  4  7 10
    2  2  5  8 11
    3  3  6  9 12
    > X_long
      id person score
    1  1     P1     4
    2  1     P2     7
    3  1     P3    10
    4  2     P1     5
    5  2     P2     8
    6  2     P3    11
    7  3     P1     6
    8  3     P2     9
    9  3     P3    12