Search code examples
rplyr

Count occurrences of factor in R, with zero counts reported


I want to count the number of occurrences of a factor in a data frame. For example, to count the number of events of a given type in the code below:

library(plyr)
events <- data.frame(type = c('A', 'A', 'B'),
                       quantity = c(1, 2, 1))
ddply(events, .(type), summarise, quantity = sum(quantity))

The output is the following:

     type quantity
1    A        3
2    B        1

However, what if I know that there are three types of events A, B and C, and I also want to see the count for C which is 0? In other words, I want the output to be:

     type quantity
1    A        3
2    B        1
3    C        0

How do I do this? It feels like there should be a function defined to do this somewhere.

The following are my two not-so-good ideas about how to go about this.

Idea #1: I know I could do this by using a for loop, but I know that it is widely said that if you are using a for loop in R, then you are doing something wrong, there must be a better way to do it.

Idea #2: Add dummy entries to the original data frame. This solution works but it feels like there should be a more elegant solution.

events <- data.frame(type = c('A', 'A', 'B'),
                       quantity = c(1, 2, 1))
events <- rbind(events, data.frame(type = 'C', quantity = 0))
ddply(events, .(type), summarise, quantity = sum(quantity))

Solution

  • You get this for free if you define your events variable correctly as a factor with the desired three levels:

    R> events <- data.frame(type = factor(c('A', 'A', 'B'), c('A','B','C')), 
    +                       quantity = c(1, 2, 1))
    R> events
      type quantity
    1    A        1
    2    A        2
    3    B        1
    R> table(events$type)
    
    A B C 
    2 1 0 
    R> 
    

    Simply calling table() on the factor already does the right thing, and ddply() can too if you tell it not to drop:

    R> ddply(events, .(type), summarise, quantity = sum(quantity), .drop=FALSE)
      type quantity
    1    A        3
    2    B        1
    3    C        0
    R>