Search code examples
rstatisticsreplicationconfidence-intervalstatistics-bootstrap

Bootstrap for Confidence Intervals


My problem is as follows: Firstly, I have to create 1000 bootstrap samples of size 100 of a "theta hat". I have a random variable X which follows a scaled t_5-distribution. The following code creates 1000 bootstrap samples of theta hat:

library("metRology", lib.loc="~/R/win-library/3.4")
 # Draw some data
data <- rt.scaled(100, df=5, mean=0, sd=2)

thetahatsq <- function(x){(3/500)*sum(x^2)}
sqrt(thetahatsq(data))

n <- 100

thetahat <- function(x){sqrt(thetahatsq(x))}
thetahat(data)

# Draw 1000 samples of size 100 from the fitted distribution, and compute the thetahat
tstar<-replicate(1000,thetahat(rt.scaled(n, df=5, mean=0, sd=thetahat(data)))) 
mean(tstar)

hist(tstar, breaks=20, col="lightgreen")

Now I want to compare the accuracy of coverage probabilities and the widths of 95% bootstrap confidence intervals constructed using the percentile method. I want to repeat the code above 1000 times, and in each case, check if the true value of the parameter belongs to the corresponding bootstrap confidence interval and calculate the length of each interval. Then average the resulting values.


Solution

  • Maybe the best way to bootstrap is to use base package boot. Functions boot and boot.ci are meant for what you want, and function boot.ci gives you options on the type of confidence intervals to compute, including type = "perc".

    See if the following answers your question.

    set.seed(402)    # make the results reproducible
    data <- rt.scaled(100, df=5, mean=0, sd=2)
    
    stat <- function(data, index) thetahat(data[index])
    
    hans <- function(data, statistic, R){
        b <- boot::boot(data, statistic, R = R)
        ci <- boot::boot.ci(b, type = "perc")
        lower <- ci$percent[4]
        upper <- ci$percent[5]
        belongs <- lower <= true_val && true_val <= upper
        data.frame(lower, upper, belongs)
    }
    
    true_val <- sqrt(thetahatsq(data))
    
    df <- do.call(rbind, lapply(seq_len(1000), function(i) hans(data, statistic = stat, R = n)))
    head(df)
    #     lower    upper belongs
    #1 1.614047 2.257732    TRUE
    #2 1.592893 2.144660    TRUE
    #3 1.669754 2.187214    TRUE
    #4 1.625061 2.210883    TRUE
    #5 1.628343 2.220374    TRUE
    #6 1.633949 2.341693    TRUE
    
    colMeans(df)
    #   lower    upper  belongs 
    #1.615311 2.227224 1.000000
    

    Explanation:

    • Function stat is a wrapper for your statistic of interest, to be used by boot.
    • Function hans automates the call to boot::boot and boot::boot.ci.
    • The calls to hans are made by lapply, a loop in disguise.
    • The results are returned as a list of data.frames, so we need to call do.call in order to rbind them into df.
    • The rest is standard R code.