Search code examples
rdata.table

Keep row count of zero when conditionally counting in data.table's .N


Similar questions at here and here but these fail to address including condition or requires separating the zero count at the start and merging them back in.

library(data.table)   
as.data.table(iris)[Sepal.Length > 6, .(n=.N), .(Species)]

returns

      Species     n
       <fctr> <int>
1: versicolor    20
2:  virginica    41

but suppose I want to include setota 0.

This can be achieved using dplyr through

iris %>%
  group_by(Species, .drop=FALSE) %>%
  filter(Sepal.Length > 6) %>%
  summarize(n = n())

  Species        n
  <fct>      <int>
1 setosa         0
2 versicolor    20
3 virginica     41

What would be a proper way to do it in data.table?

Thank you.


Solution

  • You can do the calculation in [i, j, ], e.g.

    as.data.table(iris)[, .(n = sum(Sepal.Length > 6)), by = Species]
          Species     n
           <fctr> <int>
    1:     setosa     0
    2: versicolor    20
    3:  virginica    41