I would like to report the summary statistics for a few variables of which some are categorical varibles. I want the the individual categories listed seperately and with the indiviudal frequency. I can not find any other post or explanations elsewhere, therefore I turn to stack overflow as my last resort.
This is the closest I can find to my problem, but it does not quite address the issue: Obtaining Separate Summary Statistics by Categorical Variable with Stargazer Package
currently the table looks like this:
I would like it to look more like this with regard to lisitng all values for the categorical variables but with the correct numbers of observations:
The code I used
sumstats <- demographics %>% select(treated, pandl_r2, pandl_r3, age, gender,
education, crypto.posession, crypto.knowledge,
trading.frequency)
stargazer(sumstats, out = "...")
dput looks like this
structure(list(ï..treated = c(0L, 0L, 0L, 0L, 0L, 0L), pandl_r2 = c(104.22,
93.83, -45.8, 175, 240.4, 120.4), pandl_r3 = c(63.7, 335.58,
-147.55, 165.55, 429.35, -48.03), treated = c(0L, 0L, 0L, 0L,
0L, 0L), age = c(22L, 26L, 30L, 23L, 23L, 25L), gender = c(1L,
1L, 5L, 2L, 2L, 2L), education = c(2L, 5L, 4L, 4L, 4L, 4L), crypto.posession = c(1L,
1L, 1L, 1L, 3L, 1L), crypto.knowledge = c(7L, 6L, 3L, 10L, 5L,
6L), trading.frequency = c(3L, 8L, 2L, 5L, 1L, 3L)), row.names = c(NA,
6L), class = "data.frame")
here head 10
ï..treated pandl_r2 pandl_r3 treated age gender education crypto.posession crypto.knowledge trading.frequency
1 0 104.22 63.70 0 22 1 2 1 7 3
2 0 93.83 335.58 0 26 1 5 1 6 8
3 0 -45.80 -147.55 0 30 5 4 1 3 2
4 0 175.00 165.55 0 23 2 4 1 10 5
5 0 240.40 429.35 0 23 2 4 3 5 1
6 0 120.40 -48.03 0 25 2 4 1 6 3
7 0 478.00 375.02 0 20 1 3 1 7 3
8 0 131.59 15.31 0 19 1 3 1 7 1
9 0 170.70 331.02 0 24 1 4 2 2 0
10 0 66.11 290.15 0 19 1 3 2 4 2
11 0 210.50 661.95 0 24 1 4 1 8 0
12 0 388.09 641.15 0 39 2 4 3 5 2
13 0 426.72 812.00 0 24 2 2 2 3 4
14 0 393.39 577.45 0 33 1 4 1 10 10
15 0 104.73 776.56 0 25 1 3 1 8 5
16 0 248.78 106.75 0 30 2 4 1 8 7
17 0 462.40 503.55 0 33 1 4 1 8 8
18 0 245.90 550.75 0 25 1 4 1 8 9
19 0 413.60 288.55 0 33 5 4 1 5 5
20 0 316.40 477.15 0 32 1 3 1 6 2
Every hint is appreciated, many thanks in advance
I'm assuming you want each categoric approached separately rather than in combination. You could start with
library(SmartEDA)
library(purrr)
map(c("gender","education" ),
~ExpCustomStat(demographics,
Cvar=.x,
Nvar=c("pandl_r2","pandl_r3") ,
stat = c('Count','Prop','mean','min','P0.25','median','p0.75','max'))
)
where nvar has the numeric's to assess and you list out the categories in the first input to the map. if you want all the results stacked you'd have to map the first column to a generic name before stacking like so
library(dplyr)
map_dfr(c("gender","education" ),
~ExpCustomStat(demographics,
Cvar=.x,
Nvar=c("pandl_r2","pandl_r3") ,
stat = c('Count','Prop','mean','min','P0.25','median','p0.75','max')) |>
rename_at(1, \(x)"var") |> mutate(catname = .x) |> relocate(catname)
)