Search code examples
rdplyrtidyrreshape2

long to wide - converting one column (with multiple measures) into a pair of columns


I have been trying to reshape a clustered-structure long dataset to wide format: within each cluster, the wide format will have two pairings of columns representing the long format measure.

Here's an example of the long format:

set.seed(20190419)
   dt1<-data.frame(
     id=sample(1:10),
     fam=c(rep(c('a','b'),each=3),rep(c('d'),each=4)),
     value=rnorm(10))

dt1  

id is the unique variable for each individual, fam is the clustering id (family id), and value is the measure.

Instead of having 1 row and 4 columns representing values of each member within the family, I would like the wide format to have multiple rows with 2 columns showing pairs of family member's value. For example, with a family of size 3, the wide format will include 3 rows indicating values from member 1-2, 1-3, 2-3.

And here's the ideal wide format:

wide format


Solution

  • You're looking for a self-join:

    dt1 %>% left_join(dt1, by = "fam") %>%
      filter(id.x < id.y)
    #    id.x fam    value.x id.y    value.y
    # 1     1   a  0.1231950    2  0.1090744
    # 2     1   a  0.1231950    6  0.6753560
    # 3     2   a  0.1090744    6  0.6753560
    # 4     5   b  1.2385780    7  0.7440739
    # 5     5   b  1.2385780   10  2.4763114
    # 6     7   b  0.7440739   10  2.4763114
    # 7     3   d -2.3296467    8  0.1370525
    # 8     3   d -2.3296467    4 -0.5790046
    # 9     3   d -2.3296467    9  0.3267871
    # 10    8   d  0.1370525    9  0.3267871
    # 11    4   d -0.5790046    8  0.1370525
    # 12    4   d -0.5790046    9  0.3267871