I have this dataset called x for example:
client_id | a | b | c |
---|---|---|---|
1 | 1 | 0 | 0 |
2 | 0 | 1 | 1 |
3 | 0 | 0 | 1 |
4 | 1 | 1 | 1 |
I then want to create another table, that calculates the number of occurence of each possible pair in the table. So the result would be something like this:
pair | count |
---|---|
a | 1 |
b | 0 |
c | 1 |
a,b | 0 |
a,c | 0 |
b,c | 1 |
a,b,c | 1 |
That means for example that for the first row, a occured (was equal to 1) alone (means other columns didn't, and so they were at 0) only once.
After this, I want the best way to visualize the result of the pairs in ggplot.
Edit: More explanation of the second table:
a (single letter): a is the strictly one which occured (rest are 0). a,b (two letters): a and b both occured, but not the rest. a,b,c (three letters): all a,b,c occured at the same time (all are set to 1).
Here's the pivoting part, not sure of the best way (atm) to plot.
library(dplyr)
poss <- data.frame(pair=unlist(sapply(seq_along(names(quux)[-1]), function(z) combn(x = names(quux)[-1], z, FUN = function(y) paste(y, collapse=",")))))
poss
# pair
# 1 a
# 2 b
# 3 c
# 4 a,b
# 5 a,c
# 6 b,c
# 7 a,b,c
library(dplyr)
library(tidyr)
quux %>%
pivot_longer(-client_id) %>%
summarize(pair = paste(sort(unique(name[value==1])), collapse = ","), .by = client_id) %>%
count(pair) %>%
full_join(poss, by="pair") %>%
mutate(n = coalesce(n, 0)) %>%
arrange(nchar(pair), pair)
# # A tibble: 7 × 2
# pair n
# <chr> <dbl>
# 1 a 1
# 2 b 0
# 3 c 1
# 4 a,b 0
# 5 a,c 0
# 6 b,c 1
# 7 a,b,c 1
Data
quux <- structure(list(client_id = 1:4, a = c(1L, 0L, 0L, 1L), b = c(0L, 1L, 0L, 1L), c = c(0L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -4L))