Search code examples
rsummarysummarytools

r summary table for discrete and continuous variables


My apologies if this question is a duplicate or asked somewhere already.

I like to create summary tables with this format.

1) Discrete variables    : n/N (%)
2a) Continuous variables : mean (SD); N
2b) Continuous variables : median (IQR); N

For example if this is my data

# Example dataset
set.seed(123)
data <- data.frame(
  ChildSex = sample(c("Male", "Female"), 5006, replace = TRUE),
  col1 = rnorm(5006, mean = 300, sd = 100),
  col2 = rnorm(5006, mean = 400, sd = 150),
  col3 = rnorm(5006, mean = 470, sd = 200)
)

The expected summary should appear like this

Discrete Variables                                   
Child sex                                      
   Male                             2505/5006 (50%)
   Female                           2501/5006 (50%)
   Data missing                     0   /5006 (0%)

Continuous Variables: mean (SD); N
   Col1                            299.90 (99.38); 5006
   Col2                            399.12 (151.530); 5006
Continuous Variables: median (IQR); N
   Col3                            465.85 (268.15); 5006

I have around 20 discrete variables and 30 continuous variables (18 mean,sd and 12 median, IQR). I like to create a summary table as shown above without having to enter the variable names or levels manually. Thankful for any suggestions or advise in advance..


Solution

  • set.seed(123)
    data <- data.frame(
      ChildSex = c(sample(c("Male", "Female"), 5005, replace = TRUE), NA),
      col1 = rnorm(5006, mean = 300, sd = 100),
      col2 = rnorm(5006, mean = 400, sd = 150),
      col3 = rnorm(5006, mean = 470, sd = 200)
    )
    
    data
    
        tbl_summary(data,
                    type=list(all_continuous()~"continuous2"),
                    statistic = list(c(col1,col2) ~ "{mean} ({sd}); {N_nonmiss}", 
                                     col3 ~ "{median} ({p25}-{p75}); {N_nonmiss}",
                                     all_categorical() ~"{n}/{N_nonmiss} ({p}%)"),
                    digits=list(all_continuous()~c(2, 2, 0, 0)),
                    missing="ifany",
                    missing_text = "Data missing",
                    missing_stat = "{N_miss} / {N_obs} ({p_miss}%))")
    

    Gives

    enter image description here