Search code examples
rsummarysummarization

How to summarizing nested groups in R


In a data frame like data below:

library(tidyverse)
ID <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y","Z", "a","b","c","d")
State <- rep(c("FL", "GA", "SC", "NC", "VA", "GA"), each = 5)
Location <- rep(c("alpha", "beta", "gamma"), each = 10)
Var3 <- rep(c("Bravo", "Charlie", "Delta", "Echo"), times = c(7,8,10,5))
Sex <- rep(c("M","F","M"), times = 10)
data <- data.frame(ID, State, Location, Var3, Sex)

I want to return a data frame, or a list of several data frames, that summarize each way the data can be grouped. I want to see how many individual IDs are in each State, Location, and Var3, how many M and F are in each State, Location, and Var3, how many Locations are in each State, ect... what is the best way to achieve this.


Solution

  • We can use count

    library(dplyr)
    data %>%
      count(State, Location, Var3, Sex)
    

    Also, to get rollup/cube way of hierarchial counts,

    library(data.table)
    rollup(as.data.table(data), j = .N, by = c("State","Location","Var3", "Sex"))
    #    State Location    Var3  Sex  N
    # 1:    FL    alpha   Bravo    M  3
    # 2:    FL    alpha   Bravo    F  2
    # 3:    GA    alpha   Bravo    M  2
    # 4:    GA    alpha Charlie    F  1
    # 5:    GA    alpha Charlie    M  2
    # 6:    SC     beta Charlie    F  2
    # 7:    SC     beta Charlie    M  3
    # 8:    NC     beta   Delta    M  3
    # 9:    NC     beta   Delta    F  2
    #10:    VA    gamma   Delta    M  4
    #11:    VA    gamma   Delta    F  1
    #12:    GA    gamma    Echo    F  2
    #13:    GA    gamma    Echo    M  3
    #14:    FL    alpha   Bravo <NA>  5
    #15:    GA    alpha   Bravo <NA>  2
    #16:    GA    alpha Charlie <NA>  3
    #17:    SC     beta Charlie <NA>  5
    #18:    NC     beta   Delta <NA>  5
    #19:    VA    gamma   Delta <NA>  5
    #20:    GA    gamma    Echo <NA>  5
    #21:    FL    alpha    <NA> <NA>  5
    #22:    GA    alpha    <NA> <NA>  5
    #23:    SC     beta    <NA> <NA>  5
    #24:    NC     beta    <NA> <NA>  5
    #25:    VA    gamma    <NA> <NA>  5
    #26:    GA    gamma    <NA> <NA>  5
    #27:    FL     <NA>    <NA> <NA>  5
    #28:    GA     <NA>    <NA> <NA> 10
    #29:    SC     <NA>    <NA> <NA>  5
    #30:    NC     <NA>    <NA> <NA>  5
    #31:    VA     <NA>    <NA> <NA>  5
    #32:  <NA>     <NA>    <NA> <NA> 30
    #    State Location    Var3  Sex  N
    

    Or use cube

    cube(as.data.table(data), j = .N, by = c("State","Location","Var3", "Sex"))
    #.   State Location    Var3  Sex  N
    #  1:    FL    alpha   Bravo    M  3
    #  2:    FL    alpha   Bravo    F  2
    #  3:    GA    alpha   Bravo    M  2
    #  4:    GA    alpha Charlie    F  1
    #  5:    GA    alpha Charlie    M  2
    # ---                               
    #111:  <NA>     <NA>   Delta <NA> 10
    #112:  <NA>     <NA>    Echo <NA>  5
    #113:  <NA>     <NA>    <NA>    M 20
    #114:  <NA>     <NA>    <NA>    F 10
    #115:  <NA>     <NA>    <NA> <NA> 30