Possible Duplicate:
Bootstrap a large data set
I would like to bootstrap a large two-way data set which contains multiple column and row variables. I have to preserve both row and column variables. The result should be a list containing a bootstrap of all column variables for each row variable. I am providing the required code to answer my question but i think it is not elegant. I would appreciate a better and faster code. The following is a simplified re-creation of the two-way data set:
rm(list=ls())
data <- 1:72
Create a two way matrix data:
charDataDiff <- matrix(data, nrow=9,ncol=8)
varNames <- c("A", "B", "C","A", "B", "C","A", "B", "C")
Add a character column to the charDataDiff matrix:
charDataDiff <- cbind(varNames ,data.frame(charDataDiff))
Add column names:
colnames(charDataDiff) <- c("patchId","s380","s390","s400","s410","s420","s430","s440","s450")
Separate the data using the row-variable "patchId" as the criteria. This creates three lists: one for each Variable
idColor <- c("A", "B", "C")
(patchSpectrum <- lapply(idColor, function(idColor) charDataDiff[charDataDiff$patchId==idColor,]))
Created the function sampleBoot to sample the patchSpectrum
sampleBoot <- function(nbootstrap=2, patch=3){
return(lapply(1:nbootstrap, function(i)
{patchSpectrum[[patch]][sample(1:nrow(patchSpectrum[[patch]]),replace=TRUE),]}))}
The list "k" answers my question. However, I think my code is slow for a large data set and large bootstrap. I am only bootstrapping 10 iteration for three row variables. A faster more elegant code is appreciated.
numBoots <- 10
for (i in 1: numBoots)
k <- lapply(1:3, function(n)
do.call(rbind, lapply(sampleBoot(i, n), function(x) apply(x[-1], 2, median))))
k
What I can provide is a clean rewrite of your code:
varNames
column, instead used by
to split your data as @KenWilliams had suggested in your previous question,replicate
instead of lapply
,for
loop I pointed out in my comment.It should run faster that way; if not I'd suggest you try profiling it.
charDataDiff <- matrix(1:72, nrow = 9, ncol = 8)
colnames(charDataDiff) <- c("s380", "s390", "s400", "s410",
"s420", "s430", "s440", "s450")
varNames <- c("A", "B", "C", "A", "B", "C", "A", "B", "C")
patchSpectrum <- by(charDataDiff, varNames, data.frame)
sampleOne <- function(x) x[sample(seq_len(nrow(x)), replace = TRUE), ]
sampleBoot <- function(x, n) replicate(n, sampleOne(x), simplify = FALSE)
applyMedian <- function(l) do.call(rbind, lapply(l, apply, 2, median))
k <- lapply(lapply(patchSpectrum, sampleBoot, n = 10), applyMedian)