Search code examples
rggplot2pivot-table

Count pairs occurence and plot them


I have this dataset called x for example:

client_id a b c
1 1 0 0
2 0 1 1
3 0 0 1
4 1 1 1

I then want to create another table, that calculates the number of occurence of each possible pair in the table. So the result would be something like this:

pair count
a 1
b 0
c 1
a,b 0
a,c 0
b,c 1
a,b,c 1

That means for example that for the first row, a occured (was equal to 1) alone (means other columns didn't, and so they were at 0) only once.

After this, I want the best way to visualize the result of the pairs in ggplot.

Edit: More explanation of the second table:

a (single letter): a is the strictly one which occured (rest are 0). a,b (two letters): a and b both occured, but not the rest. a,b,c (three letters): all a,b,c occured at the same time (all are set to 1).


Solution

  • Here's the pivoting part, not sure of the best way (atm) to plot.

    library(dplyr)
    poss <- data.frame(pair=unlist(sapply(seq_along(names(quux)[-1]), function(z) combn(x = names(quux)[-1], z, FUN = function(y) paste(y, collapse=",")))))
    poss
    #    pair
    # 1     a
    # 2     b
    # 3     c
    # 4   a,b
    # 5   a,c
    # 6   b,c
    # 7 a,b,c
    
    
    library(dplyr)
    library(tidyr)
    quux %>%
      pivot_longer(-client_id) %>%
      summarize(pair = paste(sort(unique(name[value==1])), collapse = ","), .by = client_id) %>%
      count(pair) %>%
      full_join(poss, by="pair") %>%
      mutate(n = coalesce(n, 0)) %>%
      arrange(nchar(pair), pair)
    # # A tibble: 7 × 2
    #   pair      n
    #   <chr> <dbl>
    # 1 a         1
    # 2 b         0
    # 3 c         1
    # 4 a,b       0
    # 5 a,c       0
    # 6 b,c       1
    # 7 a,b,c     1
    

    Data


    quux <- structure(list(client_id = 1:4, a = c(1L, 0L, 0L, 1L), b = c(0L, 1L, 0L, 1L), c = c(0L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -4L))