I have a dataset like this:
structure(list(age = c(23, 25, 60, 12), sex = c(0, 1, 0, 1),
bmi = c(25, 30, 23, 24), disease = c(0, 1, 0, 1)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L))
I want to calculate the mean(SD) or frequency(percentage) of each variable divided by sex (men <- sex=1, women <- sex=0). Afterwards, I want to report the results in a table.
I want that R automatically chooses between mean±SD or frequency(percentage) depending on type of data. For instance, age is continous (mean), bmi is continous (mean) while disease is binary (0 or 1, where 1=there is disease so I want the frequency to be the n° of patients with disease = 1)
This is an example of the final result that I want to have:
dt <- structure(list(age = c(23, 25, 60, 12), sex = c(0, 1, 0, 1),
bmi = c(25, 30, 23, 24), disease = c(0, 1, 0, 1)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L))
library(data.table)
setDT(dt) # make it a data.table
# prepare your data
dt[, sex := factor(sex, labels = c("Women", "Men"))]
dt[, disease := as.logical(disease)]
dcast(melt(dt[, lapply(.SD, \(x) {
switch(class(x),
"numeric" = sprintf("%.0f ± %.0f", mean(x), sd(x)),
"logical" = sprintf("%.0f (%.0f %%)", sum(x), 100 * sum(x) / .N)
)
}), sex], id.vars = "sex"), variable ~ sex)
# variable Women Men
# 1: age 42 ± 26 18 ± 9
# 2: bmi 24 ± 1 27 ± 4
# 3: disease 0 (0 %) 2 (100 %)