Search code examples
rggpubr

Differences in interquartile range calculations between IQR() and stats::quantile() in R and `ggpubr`


From my reading of the documentation associated with ggpubr::add_summary(), stats::IQR() and stats::quantile() I expected my use of median_iqr and median_q1q3 to return the same intervals. The documentation for IQR() says it's using quantile()... and yet, doesn't return the values I expected. Can someone explain?

sapply(c("reprex", "dplyr", "ggpubr"), require, character=TRUE)

tg <- ToothGrowth

# first two same
tg |>
  group_by(dose) |>
  summarise(med = median(len),
            iqr = median_q1q3(len))
#> # A tibble: 3 × 3
#>    dose   med iqr$y $ymin $ymax
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1   0.5  9.85  9.85  7.22  12.2
#> 2   1   19.2  19.2  16.2   23.4
#> 3   2   26.0  26.0  23.5   27.8

tg |>
  group_by(dose) |>
  summarise(med = median(len),
            iqr = median_hilow_(len, ci=0.5))
#> # A tibble: 3 × 3
#>    dose   med iqr$y $ymin $ymax
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1   0.5  9.85  9.85  7.22  12.2
#> 2   1   19.2  19.2  16.2   23.4
#> 3   2   26.0  26.0  23.5   27.8

# this is how the above are calculated:
tg |>
  group_by(dose) |>
  summarise(med = median(len),
            iqr = median_q1q3(len),
            lo = stats::quantile(len, probs = c(.25)),
            hi = stats::quantile(len, probs = c(.75)))
#> # A tibble: 3 × 5
#>    dose   med iqr$y $ymin $ymax    lo    hi
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1   0.5  9.85  9.85  7.22  12.2  7.22  12.2
#> 2   1   19.2  19.2  16.2   23.4 16.2   23.4
#> 3   2   26.0  26.0  23.5   27.8 23.5   27.8

# 3rd one is using this derivation of +/- 1 ci
# this is how 3rd is calculated:
tg |>
  group_by(dose) |>
  summarise(med = median(len),
            iqr = median_iqr(len),
            ci = IQR(len, type = 7)) |>
  mutate(lo = med - ci,
         hi = med + ci)
#> # A tibble: 3 × 6
#>    dose   med iqr$y $ymin $ymax    ci    lo    hi
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1   0.5  9.85  9.85  4.82  14.9  5.03  4.82  14.9
#> 2   1   19.2  19.2  12.1   26.4  7.12 12.1   26.4
#> 3   2   26.0  26.0  21.6   30.2  4.3  21.6   30.2

Created on 2024-04-29 by the reprex package (v2.0.1)


Solution

  • This seems to be intended behavior on the part of the {ggpubr} package authors.

    If you read the documentation for ?median_iqr() very closely it states:

    median_iqr(): returns the median and the error limits defined by the interquartile range.

    This is supposed to convey the output will be the (a) median, (b) median less the interquartile range, and (c) median plus the interquartile range.

    And in this now closed github issue, one of the package authors declined to change how median_iqr() works and instead promoted the use of median_q1q3() which is just a wrapper for median_hilow_(x, 0.5).

    As you already figured out:

    Formula 1

    The median along with the 25th and 75th percentile applies to median_q1q3() and median_hilow_(x, 0.5).

    library(dplyr)
    library(ggpubr)
    library(tidyr)
    library(purrr)
    
    toothgrowth_grouped_by_dose <- ToothGrowth |> group_by(dose)
    
    wrangle_for_left_join <- function(data, name) {
      data |>
        transmute(
          dose,
          !!name := sprintf("%05.2f [%05.2f, %05.2f]", y, ymin, ymax)
        )
    }
    
    dfs_formula_1 <- list(
      df_1 = toothgrowth_grouped_by_dose |>
        summarise(
          y = median(len),
          ymin = quantile(len, probs = 0.25),
          ymax = quantile(len, probs = 0.75)
        ) |>
        wrangle_for_left_join("formula_1"),
      
      df_2 = toothgrowth_grouped_by_dose |>
        summarise(x = median_q1q3(len)) |>
        unnest(cols = c(x)) |> 
        wrangle_for_left_join("median_q1q3"),
      
      df_3 = toothgrowth_grouped_by_dose |>
        summarise(x = median_hilow_(len, 0.5)) |>
        unnest(cols = c(x)) |>
        wrangle_for_left_join("median_hilow_")
    )
    
    reduce(dfs_formula_1, left_join, by = 'dose')
    #> # A tibble: 3 × 4
    #>    dose formula_1            median_q1q3          median_hilow_       
    #>   <dbl> <chr>                <chr>                <chr>               
    #> 1   0.5 09.85 [07.22, 12.25] 09.85 [07.22, 12.25] 09.85 [07.22, 12.25]
    #> 2   1   19.25 [16.25, 23.38] 19.25 [16.25, 23.38] 19.25 [16.25, 23.38]
    #> 3   2   25.95 [23.53, 27.83] 25.95 [23.53, 27.83] 25.95 [23.53, 27.83]
    

    Formula 2

    The median minus/plus the interquartile range applies to median_iqr().

    dfs_formula_2 <- list(
      df_4 = toothgrowth_grouped_by_dose |>
        summarise(
          y = median(len),
          iqr = IQR(len),
          ymin = y - iqr,
          ymax = y + iqr
        )|>
        wrangle_for_left_join("formula_2"),
        
      df_5 = toothgrowth_grouped_by_dose |>
        summarise(x = median_iqr(len)) |>
        unnest(cols = c(x)) |>
        wrangle_for_left_join("median_iqr")
    )
    
    reduce(dfs_formula_2, left_join, by = 'dose')
    #> # A tibble: 3 × 3
    #>    dose formula_2            median_iqr          
    #>   <dbl> <chr>                <chr>               
    #> 1   0.5 09.85 [04.82, 14.88] 09.85 [04.82, 14.88]
    #> 2   1   19.25 [12.12, 26.38] 19.25 [12.12, 26.38]
    #> 3   2   25.95 [21.65, 30.25] 25.95 [21.65, 30.25]
    

    Created on 2024-04-29 with reprex v2.1.0