I'm trying to do a social network analysis and create a data frame of associations. I have 2 columns, one with ID and one with Date. I want to separate the ID column into a "from" and "to" column based on day to see on which days two ID's were seen together.
This is what my data looks like:
Date ID
1 2010-01-02 A
2 2010-01-02 A
3 2010-01-02 A
4 2010-01-02 A
5 2010-01-02 A
6 2010-01-02 A
7 2010-01-02 A
8 2010-01-02 A
9 2010-01-02 A
10 2010-01-02 W
11 2010-01-02 W
12 2010-01-02 W
13 2010-01-02 W
14 2010-01-11 A
15 2010-01-11 A
16 2010-01-11 A
17 2010-01-11 A
18 2010-01-11 A
19 2010-01-11 A
20 2010-01-11 A
21 2010-01-11 A
22 2010-01-11 A
23 2010-01-11 G
24 2010-01-11 G
25 2010-01-11 G
26 2010-01-11 K
27 2010-01-11 K
28 2010-01-11 K
29 2010-01-11 W
30 2010-01-11 W
and I want it to look like this:
Date From To
2010-01-02 A W
2010-01-11 A G
2010-01-11 A K
2010-01-11 A W
2010-01-11 W G
2010-01-11 W K
2010-01-11 G K
I don't know that there's an automatic function in dplyr
or related packages, but we can do this:
library(dplyr)
quux %>%
reframe(
setNames(data.frame(t(combn(unique(ID), 2))), c("From", "To")),
.by = "Date"
)
# Date From To
# 1 2010-01-02 A W
# 2 2010-01-11 A G
# 3 2010-01-11 A K
# 4 2010-01-11 A W
# 5 2010-01-11 G K
# 6 2010-01-11 G W
# 7 2010-01-11 K W
Steps:
combn(unique(ID), 2)
finds the combinations for the observed IDs
.by="Date"
is one of dplyr
's ways to do grouped ops "once"combn
returns a matrix, we need to transpose it and frame it, with data.frame(t(...))
; since the default names are "V1"
and "V2"
, we wrap that in setNames
to get the names you wantdplyr
's data verbs (e.g., mutate
, summarize
, reframe
) is that if the inner expression returns a frame, it will be appended (cbind
-style) to the existing frame for mutate
, and replace it (with groups) with summarize
/reframe
, names and all. (This is one reason why I wrapped in data.frame(.)
.)Data
quux <- structure(list(Date = c("2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-02", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11", "2010-01-11"), ID = c("A", "A", "A", "A", "A", "A", "A", "A", "A", "W", "W", "W", "W", "A", "A", "A", "A", "A", "A", "A", "A", "A", "G", "G", "G", "K", "K", "K", "W", "W")), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30"))