Search code examples
rconfidence-intervalstatistics-bootstrap

How to calculate confidence interval using the "bootstrap function" in R


I am trying to calculate the confidence interval in R. Due to some special reasons, I have to do it with the functions in "bootstrap" package.(which means I can't use the functions in "boot" package.)

Here is my code.

And what I am doing is trying to calculate the Pearson correlation coefficient, and then apply the Bootstrap method (with B = 100) to obtain the estimate of the correlation coefficient. But I don't know how to construct the 95% confidence intervals.

library(bootstrap) 
data('law')

set.seed(1)
theta <- function(ind) {
  cor(law[ind, 1], law[ind, 2], method = "pearson")
  }
law.boot <- bootstrap(1:15, 100, theta) 
# sd(law$thetastar)
percent.95 <- function(x) {
  quantile(x,  .95)
  }
law.percent.95 <- bootstrap(1:15, 100, theta, func=percent.95)

Sorry if I didn't make myself clear or tag the wrong tags. Sorry twice for not producing a dataset (now it's provided) and thank professor Roland for point it out. Thanks very much!


Solution

  • Usually, after bootstrapping we use the 2.5% and 97.5% percentiles as a 95% confidence interval (because we subtract α/2=.025 from each side). See also @thothal's answer and the comments under the answers.

    R <- 1e5 - 1  ## number of bootstrap replications
    est <- with(law, cor(lsat, gpa))  ## naïve correlation
    
    theta <- function(ind) cor(law[ind, 1], law[ind, 2], method="pearson")
    set.seed(1)
    B1 <- bootstrap::bootstrap(seq(nrow(law)), R, theta) 
    (ci1 <- c(estimate=est, quantile(B1$thetastar, c(.025, .975))))
    #  estimate      2.5%     97.5% 
    # 0.7763745 0.4594845 0.9620884 
    

    Here an alternative approach from scratch:

    theta2 <- function(x) with(x, cor(lsat, gpa))
    set.seed(1)
    B2 <- replicate(R, theta2(law[sample(nrow(law), nrow(law), replace=TRUE), ]))
    (ci2 <- c(estimate=est, quantile(B2, c(.025, .975))))
    #  estimate      2.5%     97.5% 
    # 0.7763745 0.4607644 0.9617970 
    

    And finally an approach using the boot package which has a boot.ci function:

    theta3 <- function(data, k) cor(data[k, ])[1,2]
    set.seed(1)
    B3 <- boot::boot(law, theta3, R=R)
    (ci3 <- c(est, boot::boot.ci(B3, type='perc')$percent[4:5]))
    # [1] 0.7763745 0.4593727 0.9620923