Search code examples
rleast-squaresstatistics-bootstrap

Bootstrapping regression coefficients from random subsets of data


I’m attempting to perform a regression calibration on two variables using the yorkfit() function in the IsoplotR package. I would like to estimate the confidence interval of the bootstrapped slope coefficient from this model; however, instead of using the typical bootstrap method below, I’d like to only perform the iterations on 75% of the data (randomly selected) at a time. So far, using the following sample data, I managed to bootstrap the slope coefficient result of the yorkfit() function:

library(boot)
library(IsoplotR)

X <- c(9.105,8.987,8.974,8.994,8.996,8.966,9.035,9.215,9.239,
         9.307,9.227,9.17, 9.102)
Y <- c(28.1,28.9,29.6,29.5,29.0,28.8,28.5,27.3,27.1,26.5,
         27.0,27.5,28.4)
n <- length(X)
sX <- X*0.02
sY <- Y*0.05
rXY <- rep(0.8,n)
dat <- cbind(X,sX,Y,sY,rXY)
fit <- york(dat)

boot.test <- function(data,indices){
    sample = data[indices,]
    mod = york(sample)
    return (mod$b)
}

result <- boot(data=dat, statistic = boot.test, R=1000) 
boot.ci(result, type = 'bca')

...but I'm not really sure where to go from here. Any help to move me in the right direction would be greatly appreciated. I’m new to R so I apologize if question is ambiguous. Thanks.


Solution

  • Based on the package documentation, you should be able to use the ran.gen argument, with sim="parametric", to sample using a custom function. In this case, the sample is a certain percent of the total observations, chosen at random. Something like the following should accomplish what you want:

    result <- boot(
        data=dat, 
        statistic =boot.test, 
        R=1000, 
        sim="parametric",
        ran.gen=function(data, percent){
            n=nrow(data)
            indic=runif(n)
            data[rank(indic, ties.method="random")<=round(n*percent,0),]
        }, 
        percent=0.75)