Search code examples
rgtsummary

tbl_summary and numeric variables


tbl_summary [library (gtsummary)] does not treat all numeric variables in the same way and I can't figure out how to change it. For example.

mtcars only has numeric variables, so when I run this, I expect the means of every variable to be calcuated. Instead, it treats cyl, gear and carb as categorical.

tbl_summary(mtcars, statistic = list(all_numeric() ~ "{mean} ({sd})",
                                      all_categorical() ~ "{n} / {N} ({p}%)"))

I actually have a much bigger dataset and tbl_summary is treating some of the numeric variables as categorical. Would it be because there are such few N's (let's say I have a lot of missing rows) and tbl_summary does not want to calculate the mean for such a small N?

I can't wrap my mind around this!

Just a further example from my data. Q12_5_TEXT is a numeric variable, but this is the output from tbl_summary.

enter image description here


Solution

  • Variables with few unique levels are summarized categorically. For example, mtcars$cyl only has three unique levels: 4, 6, 8. With only three levels, a categorical summary is more appropriate than a mean or median.

    Use the type= argument to change the default summary type.