I have been trying to reshape a clustered-structure long dataset to wide format: within each cluster, the wide format will have two pairings of columns representing the long format measure.
Here's an example of the long format:
set.seed(20190419)
dt1<-data.frame(
id=sample(1:10),
fam=c(rep(c('a','b'),each=3),rep(c('d'),each=4)),
value=rnorm(10))
dt1
id is the unique variable for each individual, fam is the clustering id (family id), and value is the measure.
Instead of having 1 row and 4 columns representing values of each member within the family, I would like the wide format to have multiple rows with 2 columns showing pairs of family member's value. For example, with a family of size 3, the wide format will include 3 rows indicating values from member 1-2, 1-3, 2-3.
And here's the ideal wide format:
You're looking for a self-join:
dt1 %>% left_join(dt1, by = "fam") %>%
filter(id.x < id.y)
# id.x fam value.x id.y value.y
# 1 1 a 0.1231950 2 0.1090744
# 2 1 a 0.1231950 6 0.6753560
# 3 2 a 0.1090744 6 0.6753560
# 4 5 b 1.2385780 7 0.7440739
# 5 5 b 1.2385780 10 2.4763114
# 6 7 b 0.7440739 10 2.4763114
# 7 3 d -2.3296467 8 0.1370525
# 8 3 d -2.3296467 4 -0.5790046
# 9 3 d -2.3296467 9 0.3267871
# 10 8 d 0.1370525 9 0.3267871
# 11 4 d -0.5790046 8 0.1370525
# 12 4 d -0.5790046 9 0.3267871