Adding a tally/count of observations using mutate in r

I wanted to know if there was a more efficient way to add a tally to a dataset in R.

Using the mpg dataset, this is how I do it using the mpg dataset as an example.

mpg %>% 
  group_by(manufacturer) %>% 
  count() %>% 
  right_join(
    mpg
  )

So essentially, I want a count of the number of unique observations in the manufacturer column. It works fine as this is quite a small dataset, but I'm working with datasets with over 100k observations and wanted to find a better way to do it than to join in this way.

To give context, the number of unique observations are used as denominators for subsequent analyses.

Solution

If you want to go fast, you can try data.table:

library(data.table) 
res <- data.table(mpg)[,':='(cnt = .N), by = manufacturer]
res
     manufacturer  model displ year cyl      trans drv cty hwy fl   class cnt
  1:         audi     a4   1.8 1999   4   auto(l5)   f  18  29  p compact  18
  2:         audi     a4   1.8 1999   4 manual(m5)   f  21  29  p compact  18
  3:         audi     a4   2.0 2008   4 manual(m6)   f  20  31  p compact  18
  4:         audi     a4   2.0 2008   4   auto(av)   f  21  30  p compact  18
  5:         audi     a4   2.8 1999   6   auto(l5)   f  16  26  p compact  18
 ---                                                                         
230:   volkswagen passat   2.0 2008   4   auto(s6)   f  19  28  p midsize  27
231:   volkswagen passat   2.0 2008   4 manual(m6)   f  21  29  p midsize  27
232:   volkswagen passat   2.8 1999   6   auto(l5)   f  16  26  p midsize  27
233:   volkswagen passat   2.8 1999   6 manual(m5)   f  18  26  p midsize  27
234:   volkswagen passat   3.6 2008   6   auto(s6)   f  17  26  p midsize  27

Benchmark (using @phiver nice solution) :

library(dplyr)
library(microbenchmark)

microbenchmark(dplyr      =  mpg %>% group_by(manufacturer) %>% add_tally() ,
               data.table =  data.table(mpg)[,':='(cnt = .N), by = manufacturer])

Unit: milliseconds
       expr      min       lq     mean   median       uq       max neval
      dplyr 8.201807 8.557434 9.599122 9.018660 9.922339 17.425479   100
 data.table 1.245440 1.370666 1.615039 1.470719 1.691733  6.391889   100