Search code examples
rrle

count unique combinations of variable values in an R dataframe column


I want to count the unique combinations of a variable that appear per group. For example:

df <- data.frame(id = c(1,1,1,2,2,2,3,3,4,4,4,5,6,6,7,7,7),
                 status =  c("a","b","c","a","b","c","b","c","b","c","d","b","b","c","b","c", "d"))

> df
   id status
1   1      a
2   1      b
3   1      c
4   2      a
5   2      b
6   2      c
7   3      b
8   3      c
9   4      b
10  4      c
11  4      d
12  5      b
13  6      b
14  6      c
15  7      b
16  7      c
17  7      d

So that, for example, I can tally how many times a given combination of "status" appears. By hand, for example, I see that "a,b,c" appears twice total (id's 1 and 2).

These seem to be similar questions, but I couldn't work out how to do it and with clearer explanation in R: Counting unique combinations Count of unique combinations despite order

The result I think I am looking for would be something like:

abc 2
bc  3
b   1
...

Solution

  • An option with tidyverse where group by 'id', paste the 'status' and get the count

    library(dplyr)
    library(stringr)
    df %>% 
       group_by(id) %>% 
       summarise(status = str_c(status, collapse="")) %>% 
       count(status)
    # A tibble: 4 x 2
    #  status     n
    #  <chr>  <int>
    #1 abc        2
    #2 b          1
    #3 bc         2
    #4 bcd        2