I created a survey where respondents are asked questions about a number of other people. The resulting dataset has the following variables:
respondent ID
Number of people you have information on (1-4)
Person 1-Variable 1
Person 1-Variable 2
Person 2-Variable 1
Person 2-Variable 2
etc...
If the person only had information on one person, all Person 2-3-4-variables are NA.
Do analyze this properly I need to restructure the data so that each line only has information on one person. So if a respondent gave information on two people, the variables for these two people should be on a new line. The result should be something like:
Respondent ID
Variable 1
Variable 2
etc...
I tried a couple of things messing around with moving non-missing values around. But my knowledge of R is too limited to think of a proper solution.
What you are looking for is to convert your data from a so-called "wide" format to a "long(er)" format. There is a handy function for this: pivot_longer()
(see here for more details).
library(tidyr)
X_wide <- data.frame(id = 1:3, P1 = 4:6, P2 = 7:9, P3 = 10:12)
X_long <- pivot_longer(X_wide, cols = P1:P3, names_to = "person", values_to = "score")
X_long <- as.data.frame(X_long)
This is the result:
> X_wide
id P1 P2 P3
1 1 4 7 10
2 2 5 8 11
3 3 6 9 12
> X_long
id person score
1 1 P1 4
2 1 P2 7
3 1 P3 10
4 2 P1 5
5 2 P2 8
6 2 P3 11
7 3 P1 6
8 3 P2 9
9 3 P3 12