I have the following matrix (let's call it df), for which I would like to create bootstrapped means and 95% confidence intervals for each column, due to the heavily 0 weighted distribution. I would like the mean and CI's to be added to the bottom of the matrix as new rows. This is a small subset of the data, the true data has >600 rows which will make the bootstrapping much more effective.
row.names V183 V184 V185 V186 V187 V188 V189 V190 V191 V192 V193 V194 V195 V196 V197 V198 V199 V200 V201 V202 V203 V204 V205
1 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07692308 0.07692308 0.07692308 0.07692308 0.07692308 0.07692308 0.07692308 0.07692308 0.07692308 0.07692308 0.07692308 0.07692308 0.07692308 NA NA
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0.022 0 NA NA NA NA NA NA
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.07692308 0.07692308 0.07692308 0.07692308 0.07692308 0.07692308 0.07692308 0.07692308
5 0 0 0 0 0.066 0.066 0.066 0.066 0.066 0.066 0.066 0.066 0.066 0.066 0 0 0 0 0 0 0 0 0
6 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0.07142857 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA NA NA NA NA
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 NA NA NA NA NA NA NA NA NA NA NA NA 0.03225806 0.03225806 0.03225806 0.03225806 0.03225806 0.03225806 0.03225806 0.03225806 0.03225806 0.03225806 0.03225806
12 0 0 0 0 0 0 0 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA
13 0 0 0 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
14 0 0 0.033 0.033 0.033 0.033 0.033 0.033 0.033 0.033 0.033 0.033 0.033 0.033 0.033 0.033 0.033 0 0 0 0 0 0
I have had success creating bootstrapped values for a single column, but have not been successful creating a for () loop that will populate an entire row of bootstrapped values for the matrix
The following is my code for a single row.
dfsub<-df[,1]
mean.boot <- function(dfsub, d) {
E=dfsub[d,]
return(mean(E, na.rm=T))}
b = boot(dfsub, mean.boot, R=1000)
b
Any thoughts? Would a for loop or an apply fn work better?
Also, the output for the booted values gives an original value and a bias, but where is the actual bootstrapped mean?
This is a somewhat confusing question as I'm not sure whether you're bootstrapping by row or by column, plus there's a bit of the code that doesn't work, specifically E=dfsubd,]
. But if you want to get bootstrapped means for each column, a simple apply
should work fine, like so:
> myMeanFun <- function(d, i) {
d2 <- d[i]
return(mean(d2, na.rm=T))
}
> myBootFun <- function(d) {
boot(d, myMeanFun, R = 1000)
}
> lapply(df[,-1], function(x) myBootFun(x) )
$V183
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = d, statistic = myMeanFun, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 0.0186044 0.0004565272 0.008418108
$V184
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = d, statistic = myMeanFun, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 0.0186044 3.504457e-05 0.008293219
And you can use something like this to access particular statistics (here bootstrapped mean):
> sapply(df[,-1], function(x) myBootFun(x)$t0 )
V183 V184 V185 V186 V187 V188 V189
0.01860440 0.01860440 0.02114286 0.02114286 0.02621978 0.02621978 0.02621978
V190 V191 V192 V193 V194 V195 V196
0.02621978 0.02664243 0.02886264 0.02886264 0.02291026 0.02362932 0.02559843
V197 V198 V199 V200 V201 V202 V203
0.02009843 0.02650869 0.02467535 0.02631042 0.02631042 0.01861042 0.01861042
V204 V205
0.01213124 0.01213124
Also see the boot.ci
function for confidence intervals, plus this guide might be useful to you: