Search code examples
rconfidence-intervalstatistics-bootstrap

boot.ci() returns strange confidence interval for bootstrap samples


I have two groups of positive real numbers.

> dput(group1)
c(2.10753, 2.57251, 2.61687, 4.62551, 7.13166, 6347.63, 4.22139, 
10.7373, 2.11568, 2.71866, 4.09376, 10.9046, 109807, 5.87156, 
3.17082, 3.4703, 2.47262, 9.24319, 34.6945, 5.72567, 12.0134, 
108.33, 6.60707, 6.24304, 3.59048, 10.3174, 48.0265, 5.32097, 
3.77157, 6.67401, 22.633, 34.8186, 21.5315, 9.42882, 7.10627, ...)

> dput(group2)
c(4.88474, 65.4318, 128.101, 24.1271, 5.44262, 54.8987, 2.85175, 
14.1089, 172.23, 66.8563, 6.74067, 2.19603, 2.12985, 4.12735,
16.401, 3.22688, 15.6943, 4.32861, 36.4752, 7.33769, 75.855, 
62.7653, 35.1786, 3.71099, 29.0186, 34.4472, 19.1061, 2.75174, ...)

Group1 consists of ~1000 values, group 2 of ~30,000. I was interested in the ratio of medians between the two groups and used the following R-function to calculate this ratio for each of 2000 bootstrap samples (for the boot() command see function output below):

medianRatio <- function(x, i, noGroup1, noGroup2) {
    all <- x[i]
    currGroup1 <- all[1:noGroup1]
    currGroup2 <- all[c(noGroup1 + 1):length(all)]
    ratio <- median(currGroup1) / median(currGroup2)
    return(ratio)
}

The call from the boot() function looks like this

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = c(group1, group2), statistic = medianRatio, R = noBs, 
    noGroup1 = length(group1), noGroup2 = length(group2))


Bootstrap Statistics :
    original      bias    std. error
t1*  1.08847 -0.08597889  0.05451763`

The mean of the resulting distribution of bootstrap samples is 1.002, the sd is 0.054 (visual inspection of the histogram confirms a normal distribution around 1). Also:

range(Group1_Group2.BS$t) [1] 0.823311 1.198469

Yet when I run boot.ci() on the boot-object the reported confidence interval is

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 2000 bootstrap replicates

CALL : 
boot.ci(boot.out = Group1_Group2.BS, type = "norm")

Intervals : 
Level      Normal        
95%   ( 1.068,  1.281 )  
Calculations and Intervals on Original Scale

I do not understand what is going on here as the reported confidence interval does not even cover the mean of the (symmetric) distribution of the bootstrap samples. What am I missing?


Solution

  • Understanding how the normal confidence intervals are calculated might help clear things up. I found a very nice explanation in the answer to this question.

    The bootstrap normal confidence intervals are built around the observed value of the statistic with a bias correction to address the difference between the middle of the bootstrap statistic and the observed value. The CI are calculated by using the estimate of the statistic of interest from the original sample, subtracting the bias, and adding 1.96 times the bootstrap standard error. If you do that "by-hand" with the values from the boot object, you'll see you get the same values as from boot.ci.

    1.08847 - -0.08597889 - 1.96*0.05451763
    [1] 1.067594
    1.08847 - -0.08597889 + 1.96*0.05451763
    [1] 1.281303
    

    Your bias is big enough to make the difference between, say, percentile CI and normal CI noticeable. When I think about bootstrap confidence intervals, I would have expected to use the mean of the bootstrap distribution minus the bias not the observed statistic minus the bias. I haven't spent enough time thinking about this, nor do I have my bootstrapping notes with me, but you might check this problem a little more closely.