Search code examples
rfor-loopstatistics-bootstrapweighted-average

How to bootstrap weighted mean in a loop in r


I would like to run a bootstrap of a weighted mean in a for loop (I don’t think I can use ‘apply’ because it concerns a weighted mean). I would only need to store the resulting standard errors in a dataframe. Another post provided the code for how to calculate the weighted mean in a bootstrap (bootstrap weighted mean in R), and works perfectly:

library(boot)
 
mtcarsdata = mtcars #dataframe for data 
mtcarsweights = rev(mtcars) #dataframe for weights

samplewmean <- function(d, i, j) {
  d <- d[i, ]
  w <- j[i, ]
  return(weighted.mean(d, w))   
}

results_qsec <- sd(boot(data= mtcarsdata[, 6, drop = FALSE], 
                     statistic = samplewmean, 
                     R=10000, 
                     j = mtcarsweights[, 6 , drop = FALSE])[[2]], na.rm=T)
results_qsec

To then run it in a loop, I tried:

outputboot = matrix(NA, nrow=11, ncol=1)
for (k in 1:11){
  outputboot[1,k] = sd(boot(data= mtcarsdata[, k, drop = FALSE], 
                       statistic = samplewmean, 
                       R=10000, 
                       j = mtcarsweights[, k, drop = FALSE])[[2]], na.rm=T)
}
outputboot

But this doesn't work. The first output isn’t even correct. I suspect the code can’t work with two iterators: one for looping over the columns and the other for the sampling with replacement.

I hope anyone could offer some help.


Solution

  • This will calculate the standard deviation of all bootstraps for each column of the table mtcarsdata weighted by mtcarsweights.

    Since we can calculate the result in one step, we can use apply and friends (here: purrr:map_dbl)

    library(boot)
    library(purrr)
    
    set.seed(1337)
    
    mtcarsdata <- mtcars # dataframe for data
    mtcarsweights <- rev(mtcars) # dataframe for weights
    
    samplewmean <- function(d, i, j) {
      d <- d[i, ]
      w <- j[i, ]
      return(weighted.mean(d, w))
    }
    
    mtcarsdata %>%
      ncol() %>%
      seq() %>%
      map_dbl(~ {
        # .x is the number of the current column
        sd(boot(
          data = mtcarsdata[, .x, drop = FALSE],
          statistic = samplewmean,
          R = 10000,
          j = mtcarsweights[, .x, drop = FALSE]
        )[[2]], na.rm = T)
      })
    #>  [1]  0.90394218  0.31495232 23.93790468  6.34068205  0.09460257  0.19103196
    #>  [7]  0.33131814  0.07487754  0.07745781  0.13477355  0.27240347
    

    Created on 2021-12-10 by the reprex package (v2.0.1)