Search code examples
rstatistics-bootstrap

Bootstrapping multiple columns with R


I'm relatively new at R and I'm trying to build a function which will loop through columns in an imported table and produce an output which consists of the means and 95% confidence intervals. Ideally it should be possible to bootstrap columns with different sample sizes, but first I would like to get the iteration working. I have something that sort-of works, but I can't get it all the way there. This is what the code looks like, with the sample data and output included:

#cdata<-read.csv(file.choose(),header=T)#read data from selected file, works, commented out because data is provided below
#cdata #check imported data

#Sample Data
#   WALL NRPK CISC WHSC LKWH YLPR
#1    21    8    1    2    2    5
#2    57    9    3    1    0    1
#3    45    6    9    1    2    0
#4    17   10    2    0    3    0
#5    33    2    4    0    0    0
#6    41    4   13    1    0    0
#7    21    4    7    1    0    0
#8    32    7    1    7    6    0
#9     9    7    0    5    1    0
#10    9    4    1    0    0    0

x<-cdata[,c("WALL","NRPK","LKWH","YLPR")] #only select relevant species

i<-nrow(x) #count number of rows for bootstrapping 
g<-ncol(x) #count number of columns for iteration

#build bootstrapping function, this works for the first column but doesn't iterate

bootfun <- function(bootdata, reps) {

  boot <- function(bootdata){

    s1=sample(bootdata, size=i, replace=TRUE)
    ms1=mean(s1)
    return(ms1)

  } # a single bootstrap

  bootrep <- replicate(n=reps, boot(bootdata))

  return(bootrep)

} #replicates bootstrap of "bootdata" "reps" number of times and outputs vector of results

cvr1 <- bootfun(x$YLPR,50000) #have unsuccessfully tried iterating the location various ways (i.e. x[i])
cvrquantile<-quantile(cvr1,c(0.025,0.975))
cvrmean<-mean(cvr1)
vec<-c(cvrmean,cvrquantile) #puts results into a suitable form for output
vecr<-sapply(vec,round,1) #rounds results
vecr

      2.5% 97.5% 
 28.5  19.4  38.1 

#apply(x[1:g],2,bootfun) ##doesn't work in this case

#desired output:

#Species    Mean LowerCI UpperCI
#WALL       28.5    19.4      38.1
#NRPK       6.1 4.6    7.6
#YLPR       0.6 0.0    1.6

I've also tried this using the boot package, and it works beautifully to iterate through the means but I can't get it to do the same with the confidence intervals. The "ordinary" code above also has the advantage that you can easily retrieve the bootstrapping results, which might be used for other calculations. For the sake of completeness here is the boot code:

#Bootstrapping using boot package
library(boot)
#data<-read.csv(file.choose(),header=TRUE) #read data from selected file
#x<-data[,c("WALL","NRPK","LKWH","YLPR")] #only select relevant columns
#x #check data

#Sample Data

#  WALL NRPK LKWH YLPR
#1    21    8    2    5
#2    57    9    0    1
#3    45    6    2    0
#4    17   10    3    0
#5    33    2    0    0
#6    41    4    0    0
#7    21    4    0    0
#8    32    7    6    0
#9     9    7    1    0
#10    9    4    0    0

i<-nrow(x) #count number of rows for resampling 
g<-ncol(x) #count number of columns to step through with bootstrapping
boot.mean<-function(x,i){boot.mean<-mean(x[i])} #bootstrapping function to get the mean

z<-boot(x, boot.mean,R=50000) #bootstrapping function, uses mean and number of reps
boot.ci(z,type="perc") #derive 95% confidence intervals
apply(x[1:g],2, boot.mean) #bootstrap all columns

#output:
#WALL NRPK LKWH YLPR 
#28.5  6.1  1.4  0.6 

I've gone through all of the resources I can find and can't seem to get things working. What I would like for output would be the bootstrapped means with the associated confidence intervals for each column. Thanks!


Solution

  • Note: apply(x[1:g],2, boot.mean) #bootstrap all columns doesn't do any bootstrap. You are simply calculating the mean for each column.

    For bootstrap mean and confidence interval, try this:

    apply(x,2,function(y){ 
       b<-boot(y,boot.mean,R=50000); 
       c(mean(b$t),boot.ci(b,type="perc", conf=0.95)$percent[4:5])
    })