I am using the function boot in R to do a bootstrap, but instead of passing my dataset directly as the data parameter in the boot function, I pass an index that is used inside the statistic to merge two data tables to get my result. It seems that boot uses the result of the first bootstrap as the real sampled data (say the empirical value). Is this correct? Because when I do the bootstrap manually I get similar results. Although I would expect boot to use 'data' as the original data. I am confused. The CI make sense but I would expect it not to work, unless for the reason I have mentioned.
In short, I have an index vector
x=1:100
and my function
myboot <- function(data,indeces) {
toselect <- data[indeces] # allows boot to select sample
toselect=as.data.table(toselect)
#this is where I use the index for the merge
t=merge(toselect,mydataset,allow.cartesian=TRUE)
return(nrow(t))
}
b <- boot(data=x, statistic=myboot, R=1000)
The results I get
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = x, statistic = myboot, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 397.2477 -0.03669725 11.70803
> boot.ci(b, type="bca")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = b, type = "bca")
Intervals :
Level BCa
95% (375.2, 421.1 )
Yes you are correct.
The function used to compute the statistic has the following requirement (according to the help page):
... In all other cases statistic must take at least two arguments. The first argument passed will always be the original data. The second will be a vector of indices, frequencies or weights which define the bootstrap sample. Further, if predictions are required, then a third argument is required which would be a vector of the random indices used to generate the bootstrap predictions.
Since your dataset consists of the numbers from 1:100
then the second argument passed will sample from 1:100
and will end up producing the exact same result. In other words your data[indeces]
line will be identical to indeces
.