My problem is as follows: Firstly, I have to create 1000 bootstrap samples of size 100 of a "theta hat". I have a random variable X which follows a scaled t_5-distribution. The following code creates 1000 bootstrap samples of theta hat:
library("metRology", lib.loc="~/R/win-library/3.4")
# Draw some data
data <- rt.scaled(100, df=5, mean=0, sd=2)
thetahatsq <- function(x){(3/500)*sum(x^2)}
sqrt(thetahatsq(data))
n <- 100
thetahat <- function(x){sqrt(thetahatsq(x))}
thetahat(data)
# Draw 1000 samples of size 100 from the fitted distribution, and compute the thetahat
tstar<-replicate(1000,thetahat(rt.scaled(n, df=5, mean=0, sd=thetahat(data))))
mean(tstar)
hist(tstar, breaks=20, col="lightgreen")
Now I want to compare the accuracy of coverage probabilities and the widths of 95% bootstrap confidence intervals constructed using the percentile method. I want to repeat the code above 1000 times, and in each case, check if the true value of the parameter belongs to the corresponding bootstrap confidence interval and calculate the length of each interval. Then average the resulting values.
Maybe the best way to bootstrap is to use base package boot
. Functions boot
and boot.ci
are meant for what you want, and function boot.ci
gives you options on the type of confidence intervals to compute, including type = "perc"
.
See if the following answers your question.
set.seed(402) # make the results reproducible
data <- rt.scaled(100, df=5, mean=0, sd=2)
stat <- function(data, index) thetahat(data[index])
hans <- function(data, statistic, R){
b <- boot::boot(data, statistic, R = R)
ci <- boot::boot.ci(b, type = "perc")
lower <- ci$percent[4]
upper <- ci$percent[5]
belongs <- lower <= true_val && true_val <= upper
data.frame(lower, upper, belongs)
}
true_val <- sqrt(thetahatsq(data))
df <- do.call(rbind, lapply(seq_len(1000), function(i) hans(data, statistic = stat, R = n)))
head(df)
# lower upper belongs
#1 1.614047 2.257732 TRUE
#2 1.592893 2.144660 TRUE
#3 1.669754 2.187214 TRUE
#4 1.625061 2.210883 TRUE
#5 1.628343 2.220374 TRUE
#6 1.633949 2.341693 TRUE
colMeans(df)
# lower upper belongs
#1.615311 2.227224 1.000000
Explanation:
stat
is a wrapper for your statistic of interest, to be used by boot
.hans
automates the call to boot::boot
and boot::boot.ci
.hans
are made by lapply
, a loop in disguise.do.call
in order to rbind
them into df
.R
code.