Search code examples
rdplyrassociations

Separate 1 column into 2 based on occurrences on the same day in R?


I'm trying to do a social network analysis and create a data frame of associations. I have 2 columns, one with ID and one with Date. I want to separate the ID column into a "from" and "to" column based on day to see on which days two ID's were seen together.

This is what my data looks like:

       Date    ID
1   2010-01-02  A
2   2010-01-02  A
3   2010-01-02  A
4   2010-01-02  A
5   2010-01-02  A
6   2010-01-02  A
7   2010-01-02  A
8   2010-01-02  A
9   2010-01-02  A
10  2010-01-02  W
11  2010-01-02  W
12  2010-01-02  W
13  2010-01-02  W
14  2010-01-11  A
15  2010-01-11  A
16  2010-01-11  A
17  2010-01-11  A
18  2010-01-11  A
19  2010-01-11  A
20  2010-01-11  A
21  2010-01-11  A
22  2010-01-11  A
23  2010-01-11  G
24  2010-01-11  G
25  2010-01-11  G
26  2010-01-11  K
27  2010-01-11  K
28  2010-01-11  K
29  2010-01-11  W
30  2010-01-11  W

and I want it to look like this:

    Date   From To
 2010-01-02  A   W
 2010-01-11  A   G
 2010-01-11  A   K 
 2010-01-11  A   W
 2010-01-11  W   G
 2010-01-11  W   K
 2010-01-11  G   K

Solution

  • I don't know that there's an automatic function in dplyr or related packages, but we can do this:

    library(dplyr)
    quux %>%
      reframe(
        setNames(data.frame(t(combn(unique(ID), 2))), c("From", "To")),
        .by = "Date"
      )
    #         Date From To
    # 1 2010-01-02    A  W
    # 2 2010-01-11    A  G
    # 3 2010-01-11    A  K
    # 4 2010-01-11    A  W
    # 5 2010-01-11    G  K
    # 6 2010-01-11    G  W
    # 7 2010-01-11    K  W
    

    Steps:

    • combn(unique(ID), 2) finds the combinations for the observed IDs
    • .by="Date" is one of dplyr's ways to do grouped ops "once"
    • because combn returns a matrix, we need to transpose it and frame it, with data.frame(t(...)); since the default names are "V1" and "V2", we wrap that in setNames to get the names you want
    • a lesser-known functionality of dplyr's data verbs (e.g., mutate, summarize, reframe) is that if the inner expression returns a frame, it will be appended (cbind-style) to the existing frame for mutate, and replace it (with groups) with summarize/reframe, names and all. (This is one reason why I wrapped in data.frame(.).)

    Data

    quux <- structure(list(Date = c("2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11"), ID = c("A", "A", "A", "A", "A", "A", "A", "A", "A", "W",  "W", "W", "W", "A", "A", "A", "A", "A", "A", "A", "A", "A", "G", "G", "G", "K", "K", "K", "W", "W")), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30"))