Search code examples
rdata.tableaggregate

Calculate multiple aggregations on several variables


I use to calculate a short summary table this way:

library(data.table)
library(tidyverse)

mtcars.dt <- data.table(mtcars)
mtcars_grby <- mtcars.dt %>% 
  # group by cyl
  group_by(cyl) %>% 
  # summarise
  summarise(
    # number observ
    quantity = n(), 
    # average
    mpg_avg = mean(mpg, na.rm = TRUE), 
    # min and max weight
    wt_min = min(wt), 
    wt_max = max(wt))

mtcars_grby
# A tibble: 3 x 5
    cyl quantity mpg_avg wt_min wt_max
  <dbl>    <int>   <dbl>  <dbl>  <dbl>
1     4       11    26.7   1.51   3.19
2     6        7    19.7   2.62   3.46
3     8       14    15.1   3.17   5.42

Which is the shortest way to do this with data.table only? I only can imagine merging four different aggregations, but it is not elegant really.


Solution

  • Use .(...), keyby= and .N like this:

    library(data.table)
    mtcars.dt <- data.table(mtcars)
    
    mtcars.dt[, .(quantity = .N,
      mpg_avg = mean(mpg, na.rm = TRUE),
      wt_min = min(wt),
      wt_max = max(wt)),
      keyby = cyl]
    

    giving

    Key: <cyl>
         cyl quantity  mpg_avg wt_min wt_max
       <num>    <int>    <num>  <num>  <num>
    1:     4       11 26.66364  1.513  3.190
    2:     6        7 19.74286  2.620  3.460
    3:     8       14 15.10000  3.170  5.424