Search code examples
rhtml-tablestatisticsstargazer

Summary statistics for each category of categorical variables in R


I would like to report the summary statistics for a few variables of which some are categorical varibles. I want the the individual categories listed seperately and with the indiviudal frequency. I can not find any other post or explanations elsewhere, therefore I turn to stack overflow as my last resort.

This is the closest I can find to my problem, but it does not quite address the issue: Obtaining Separate Summary Statistics by Categorical Variable with Stargazer Package

currently the table looks like this:

enter image description here

I would like it to look more like this with regard to lisitng all values for the categorical variables but with the correct numbers of observations:

enter image description here

The code I used

sumstats <- demographics %>% select(treated, pandl_r2, pandl_r3, age, gender, 
                                    education, crypto.posession, crypto.knowledge, 
                                    trading.frequency) 

stargazer(sumstats, out = "...")

dput looks like this

structure(list(ï..treated = c(0L, 0L, 0L, 0L, 0L, 0L), pandl_r2 = c(104.22, 
93.83, -45.8, 175, 240.4, 120.4), pandl_r3 = c(63.7, 335.58, 
-147.55, 165.55, 429.35, -48.03), treated = c(0L, 0L, 0L, 0L, 
0L, 0L), age = c(22L, 26L, 30L, 23L, 23L, 25L), gender = c(1L, 
1L, 5L, 2L, 2L, 2L), education = c(2L, 5L, 4L, 4L, 4L, 4L), crypto.posession = c(1L, 
1L, 1L, 1L, 3L, 1L), crypto.knowledge = c(7L, 6L, 3L, 10L, 5L, 
6L), trading.frequency = c(3L, 8L, 2L, 5L, 1L, 3L)), row.names = c(NA, 
6L), class = "data.frame")

here head 10

 ï..treated pandl_r2 pandl_r3 treated age gender education crypto.posession crypto.knowledge trading.frequency
1           0   104.22    63.70       0  22      1         2                1                7                 3
2           0    93.83   335.58       0  26      1         5                1                6                 8
3           0   -45.80  -147.55       0  30      5         4                1                3                 2
4           0   175.00   165.55       0  23      2         4                1               10                 5
5           0   240.40   429.35       0  23      2         4                3                5                 1
6           0   120.40   -48.03       0  25      2         4                1                6                 3
7           0   478.00   375.02       0  20      1         3                1                7                 3
8           0   131.59    15.31       0  19      1         3                1                7                 1
9           0   170.70   331.02       0  24      1         4                2                2                 0
10          0    66.11   290.15       0  19      1         3                2                4                 2
11          0   210.50   661.95       0  24      1         4                1                8                 0
12          0   388.09   641.15       0  39      2         4                3                5                 2
13          0   426.72   812.00       0  24      2         2                2                3                 4
14          0   393.39   577.45       0  33      1         4                1               10                10
15          0   104.73   776.56       0  25      1         3                1                8                 5
16          0   248.78   106.75       0  30      2         4                1                8                 7
17          0   462.40   503.55       0  33      1         4                1                8                 8
18          0   245.90   550.75       0  25      1         4                1                8                 9
19          0   413.60   288.55       0  33      5         4                1                5                 5
20          0   316.40   477.15       0  32      1         3                1                6                 2

Every hint is appreciated, many thanks in advance


Solution

  • I'm assuming you want each categoric approached separately rather than in combination. You could start with

    library(SmartEDA)
    library(purrr)
    map(c("gender","education" ),
        ~ExpCustomStat(demographics,       
                      Cvar=.x, 
                      Nvar=c("pandl_r2","pandl_r3") ,
                      stat = c('Count','Prop','mean','min','P0.25','median','p0.75','max'))
        )
    

    where nvar has the numeric's to assess and you list out the categories in the first input to the map. if you want all the results stacked you'd have to map the first column to a generic name before stacking like so

    library(dplyr)
    map_dfr(c("gender","education" ),
        ~ExpCustomStat(demographics,       
                      Cvar=.x, 
                      Nvar=c("pandl_r2","pandl_r3") ,
                      stat = c('Count','Prop','mean','min','P0.25','median','p0.75','max')) |>
          rename_at(1, \(x)"var") |> mutate(catname = .x) |> relocate(catname)
        )