Search code examples
rbootmedianconfidence-interval

Bootstrapping Two Medians with "boot" function in R


I am trying to implement the Bootstrapping Two Medians procedure either in R or in Matlab, by following the procedure described in that article:

  1. Bootstrap each sample separately, creating the sampling distribution for each median.
  2. Then calculate the difference between the medians, and create the sampling distribution of those differences. This is the sampling distribution we care about.
  3. Once we have that distribution we can establish a confidence interval on that, and report the result.
  4. If the confidence interval does not include 0, we can reject the null hypothesis that there is no difference between the medians of the two conditions.

I started to get this method in R:

x <- rnorm(100, mean = 10, sd = 2)
y <- rnorm(100, mean = 12, sd = 3)
bx <- boot(data = x,statistic = function(x,i) median(x[i]),R = 1000)
by <- boot(data = y,statistic = function(y,i) median(y[i]),R = 1000)

However, I do not know how to "calculate the difference between the medians" from those "bx" and "by".

Any suggestion, in order to implement the Bootstrapping Two Medians method?


Solution

  • In order to call boot just once, define a function that accepts both vectors and returns the difference of the medians.

    library(boot)
    
    # data recreated with pseudo-RNG seed set
    # so that the results are reproducible
    set.seed(2024)
    x <- rnorm(100, mean = 10, sd = 2)
    y <- rnorm(100, mean = 12, sd = 3)
    
    # bootstrap statistic
    # argument 'data' is a 2 column matrix
    fboot <- function(data, i) {
      x <- data[i, 1L]
      y <- data[i, 2L]
      median(x) - median(y)
    }
    
    # set the replicates just once, makes the code simpler
    R <- 1000L
    
    # OP code
    bx <- boot(data = x, statistic = function(x,i) median(x[i]), R = R)
    by <- boot(data = y, statistic = function(y,i) median(y[i]), R = R)
    
    # using the function fboot
    b1 <- boot(data = cbind(x, y), statistic = fboot, R = R)
    
    # the results are comparable to the results above
    bb <- bx$t - by$t
    cut(bb, pretty(bb)) |> table()
    #> 
    #> (-4,-3.5] (-3.5,-3] (-3,-2.5] (-2.5,-2] (-2,-1.5] (-1.5,-1] (-1,-0.5] 
    #>        11        78       301       387       184        31         8
    cut(b1$t, pretty(b1$t)) |> table()
    #> 
    #> (-4,-3.5] (-3.5,-3] (-3,-2.5] (-2.5,-2] (-2,-1.5] (-1.5,-1] (-1,-0.5] 
    #>        11        76       299       385       164        57         8
    
    # see them
    old_par <- par(mfrow = c(1, 2))
    hist(bb)
    abline(v = mean(bb), col = "red")
    hist(b1$t)
    abline(v = mean(b1$t), col = "red")
    

    par(old_par)
    
    # final check
    median(x) - median(y)
    #> [1] -2.440257
    mean(b1$t)
    #> [1] -2.336104
    mean(bb)
    #> [1] -2.359999
    
    # 95% confidence intervals
    quantile(b1$t, c(0.025, 0.975))
    #>      2.5%     97.5% 
    #> -3.326367 -1.258051
    quantile(bb, c(0.025, 0.975))
    #>      2.5%     97.5% 
    #> -3.335830 -1.342738
    

    Created on 2024-09-23 with reprex v2.1.0