Search code examples
rquantilegtsummary

When tbl_summary() calculates percentiles using the stats::quantiles function, does it default to using type 7 or type 2 algorithm?


In tbl_summary's reference page, it states:

"Additionally, {p##} is available for percentiles, where ## is an integer from 0 to 100. For example, p25: quantile(probs=0.25, type=2)."

Does this mean that the type 2 algorithm is used in calculating percentiles using the quantile function? The documentation for stats::quantile reports that type = 7 is the default algorithm, and in my experience it seems that tbl_summary is using type = 7 not type = 2.

Is this accurate?

As a follow-up question, is it possible to change the algorithm "type" argument within the stats::quantile function implemented within tbl_summary's percentiles calculations?

Here is a representative example of the discrepancy between type 7 (quantile default) versus type 2 (tbl_summary reference page)

data <- data.table::data.table(values = c(120, 120, 140, 210))

tbl_summary(
    data,
    type = list(values ~ 'continuous'),
    statistic = list(values ~ "{p75}"),
    digits = everything() ~ 1
    )

Quantile defaults to using type = 7

stats::quantile(data$values, probs = 0.75)

75% 157.5

In tbl_summary reference, the "example" describing percentiles calculation uses type = 2 for quantile function

stats::quantile(data$values, probs = 0.75, type = 2)

75% 175

It would be helpful to control the algorithm "type" that quantile uses within tbl_summary. This would be particularly useful for how the IQRs are reported - especially for small N.


Solution

  • As of gtsummary v2.0, the percentiles {p##} are calculated with quantile(type=2). The default changed from type=7 to type=2 in the v2.0 release, and is documented in the changlog.

    enter image description here

    The {p##} are just helpers to make our lives a bit easier. If you need other types of quantiles, you can just create functions that do what you prefer.

    library(gtsummary)
    
    p25_t7 <- \(x) quantile(x, probs = 0.25, type = 7)
    p75_t7 <- \(x) quantile(x, probs = 0.75, type = 7)
    
    trial |> 
      tbl_summary(
        include = age,
        statistic = all_continuous() ~ "{median} ({p25_t7}, {p75_t7})",
        missing = "no"
      ) |> 
      as_kable()
    
    Characteristic N = 200
    Age 47 (38, 57)

    Created on 2024-08-19 with reprex v2.1.0