Search code examples
rdatabasedataframesortingmean

Create table with means and frequencies in R


I have a dataset like this:

structure(list(age = c(23, 25, 60, 12), sex = c(0, 1, 0, 1), 
    bmi = c(25, 30, 23, 24), disease = c(0, 1, 0, 1)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -4L))

I want to calculate the mean(SD) or frequency(percentage) of each variable divided by sex (men <- sex=1, women <- sex=0). Afterwards, I want to report the results in a table.

I want that R automatically chooses between mean±SD or frequency(percentage) depending on type of data. For instance, age is continous (mean), bmi is continous (mean) while disease is binary (0 or 1, where 1=there is disease so I want the frequency to be the n° of patients with disease = 1)

This is an example of the final result that I want to have:

enter image description here


Solution

  • dt <- structure(list(age = c(23, 25, 60, 12), sex = c(0, 1, 0, 1), 
                   bmi = c(25, 30, 23, 24), disease = c(0, 1, 0, 1)), class = c("tbl_df", 
                                                                                "tbl", "data.frame"), row.names = c(NA, -4L))
    
    library(data.table)
    
    setDT(dt) # make it a data.table
    
    # prepare your data
    dt[, sex := factor(sex, labels = c("Women", "Men"))]
    dt[, disease := as.logical(disease)]
    
    
    dcast(melt(dt[, lapply(.SD, \(x) {
      switch(class(x),
             "numeric" = sprintf("%.0f ± %.0f", mean(x), sd(x)),
             "logical" = sprintf("%.0f (%.0f %%)", sum(x), 100 * sum(x) / .N)
      )
    }), sex], id.vars = "sex"), variable ~ sex)
    
    #    variable    Women         Men
    # 1:      age  42 ± 26      18 ± 9
    # 2:      bmi   24 ± 1      27 ± 4
    # 3:  disease  0 (0 %)   2 (100 %)