I am working with a dataset df1 in R. It has two columns: Date and Student_ID. I first want to find out the number of pairs of students who have appeared together in a certain date. Suppose these pairs of students are in a column V. Now I want to create a dataframe with columns V such that corresponding to every Date, we have a 0 or 1 under every column if the pair of students appeared on the same date. I am having trouble writing this code as it is quite a complex data manipulation to come up with.
This is how the dataframe looks like:
> head(pair_combinations, 20)
# A tibble: 20 × 2
DATE STUDENT_ID
<int> <chr>
1 21500001 203145
2 21500001 200794
3 21500001 201143
4 21500001 2594
5 21500001 201952
6 21500001 200757
7 21500001 203934
8 21500001 203471
9 21500001 203488
10 21500001 203118
11 21500019 203145
12 21500019 200794
13 21500019 201143
14 21500019 2594
15 21500019 201952
16 21500019 203471
17 21500019 203934
18 21500019 203118
19 21500019 203501
20 21500026 203145
I want the output like:
DATE INTERACTION_203145_200794 INTERACTION_203145_201143 ...
<int> <chr>
1 21500001 1 1
2 21500019 1 1 ...
...
It's not pretty, but I'm guessing this might work
First, a helper function to get all pairs of values
all_pairs<-function(x) {
if (length(x) >= 2) {
sapply(combn(sort(unique(x)),2, simplify=FALSE), paste, collapse="_")
} else {
NA
}
}
We sort the values so the smaller ID always comes first so we don't repeat pairs (it will always be A_B and not B_A).
Now we can do some reshaping
library(dplyr)
library(tidyr)
dd %>%
group_by(DATE) %>%
reframe(intx = all_pairs(STUDENT_ID)) %>%
mutate(val = 1) %>%
pivot_wider(names_from = intx, values_from = val, values_fill = 0)
Tested with
dd <- read.table(text="
DATE STUDENT_ID
1 21500001 203145
2 21500001 200794
3 21500001 201143
4 21500001 2594
5 21500001 201952
6 21500001 200757
7 21500001 203934
8 21500001 203471
9 21500001 203488
10 21500001 203118
11 21500019 203145
12 21500019 200794
13 21500019 201143
14 21500019 2594
15 21500019 201952
16 21500019 203471
17 21500019 203934
18 21500019 203118
19 21500019 203501
20 21500026 203145", header=TRUE)