Search code examples
rsummarystargazer

Obtaining Separate Summary Statistics by Categorical Variable with Stargazer Package


I would like to use to produce summary statistics for each category of a grouping variable. I could do it in separate tables, but I'd like it all in one – if that is not unreasonably challenging for this package.

For example

library(stargazer)
stargazer(ToothGrowth, type = "text")
#> 
#> =========================================
#> Statistic N   Mean  St. Dev.  Min   Max  
#> -----------------------------------------
#> len       60 18.813  7.649   4.200 33.900
#> dose      60 1.167   0.629   0.500 2.000 
#> -----------------------------------------

provides summary statistics for the continuous variables in ToothGrowth. I would like to split that summary by the categorical variable supp, also in ToothGrowth.

Two suggestions for desired outcome,

stargazer(ToothGrowth ~ supp, type = "text")
#> 
#> ==================================================
#> Statistic         N   Mean   St. Dev.  Min   Max  
#> --------------------------------------------------
#> OJ       len       30 16.963  8.266   4.200 33.900
#>          dose      30  1.167  0.634   0.500  2.000
#> VC       len       30 20.663  6.606   8.200 30.900
#>          dose      30  1.167  0.634   0.500  2.000 
#> --------------------------------------------------
#> 
 stargazer(ToothGrowth ~ supp, type = "text")
#> 
#> ==================================================
#> Statistic          N   Mean   St. Dev.  Min   Max  
#> --------------------------------------------------
#> len               
#>        _by VC     30 16.963  8.266   4.200 33.900
#>        _by VC     30  1.167  0.634   0.500  2.000
#> _tot              60 18.813  7.649   4.200 33.900
#> 
#> dose             
#>        _by OJ     30 20.663  6.606   8.200 30.900
#>        _by OJ     30  1.167  0.634   0.500  2.000 
#> _tot              60 1.167   0.629   0.500 2.000         
#> --------------------------------------------------

Solution

  • Solution

    library(stargazer)
    library(dplyr)
    library(tidyr)
    
    ToothGrowth %>%
        group_by(supp) %>%
        mutate(id = 1:n()) %>%
        ungroup() %>%
        gather(temp, val, len, dose) %>%
        unite(temp1, supp, temp, sep = '_') %>%
        spread(temp1, val) %>%
        select(-id) %>%
        as.data.frame() %>%
        stargazer(type = 'text')
    

    Result

    =========================================
    Statistic N   Mean  St. Dev.  Min   Max  
    -----------------------------------------
    OJ_dose   30 1.167   0.634   0.500 2.000 
    OJ_len    30 20.663  6.606   8.200 30.900
    VC_dose   30 1.167   0.634   0.500 2.000 
    VC_len    30 16.963  8.266   4.200 33.900
    -----------------------------------------
    

    Explanation

    This gets rid of the problem mentioned by the OP in a comment to the original answer, "What I really want is a single table with summary statistics separated by a categorical variable instead of creating separate tables." The easiest way I saw to do that with stargazer was to create a new data frame that had variables for each group's observations using a gather(), unite(), spread() strategy. The only trick to it is to avoid duplicate identifiers by creating unique identifiers by group and dropping that variable before calling stargazer().