Search code examples
rdplyrsummarize

Number of categories not equal to a specific one


I have a data frame with many categorical columns. I would like to count the number of distinct categories not equal to "bla". So for example:

> d1
# A tibble: 5 x 2
    x      y    
  <chr>  <chr>
1 yellow A    
2 green  A    
3 green  bla  
4 blue   B    
5 bla    bla  

How can I modify dplyr's

d1 %>% summarise_all(n_distinct)

to exclude the category "bla"? In this case, the answer should be 3 for column x and 2 for column y.


Solution

  • Using base::lengths():

    lengths(lapply(d1, function(i) unique(i[ i != "bla" ])))
    # x y 
    # 3 2