Search code examples
rtidyversepurrrstatistics-bootstrapmodelr

Bootstrapped standard errors and p-value from weighted mann-whitney test


I would like to bootstrap the p-value and standard errors from weighted Mann-Whitney U test.

I can run the test as: weighted_mannwhitney(c12hour ~ c161sex + weight, efc) which works fine, but am not entirely sure how I can run a bootstrapped version of the same to obtain a bootstrapped p-value for instance.

library(sjstats) # weighted Mann-Whitney
library(tidyverse) # main workflow, which has purrr and forcats (IIRC)
# library(broom) # for tidying model output, but not directly loaded
library(modelr) # for bootstrap


data(efc)
efc$weight <- abs(rnorm(nrow(efc), 1, .3))

# weighted Mann-Whitney-U-test ----
weighted_mannwhitney(c12hour ~ c161sex + weight, efc)


# Bootstrapping

set.seed(1000) # for reproducibility

boot_efc <- efc %>% bootstrap(1000) 


# Throws error!
boot_efc %>% 
  dplyr::mutate(c12hour = map(strap, ~weighted_mannwhitney(c12hour ~ c161sex + weight, data = .)),
                tidy = map(c12hour, broom::tidy)) -> boot_efc_out

SIDE NOTE: The package for the weighted Mann-Whitney test has its own bootstrap function which can be used as shown below to obtain bootstrapped standard error and bootstrapped p-value, but this is running a different function (mean), I could not adapt that for the weighted Mann-Whitney. Not sure if this helps

# or as tidyverse-approach
if (require("dplyr") && require("purrr")) {
  bs <- efc %>%
    bootstrap(100) %>%
    mutate(
      c12hour = map_dbl(strap, ~mean(as.data.frame(.x)$c12hour, na.rm = TRUE))
    )

  # bootstrapped standard error
  boot_se(bs, c12hour)

    # bootstrapped p-value
  boot_p(bs, c12hour)
}

Solution

  • This should do the trick:

      library(sjstats) # weighted Mann-Whitney
    library(tidyverse) # main workflow, which has purrr and forcats (IIRC)
    # library(broom) # for tidying model output, but not directly loaded
    library(modelr) # for bootstrap
    #> 
    #> Attaching package: 'modelr'
    #> The following objects are masked from 'package:sjstats':
    #> 
    #>     bootstrap, mse, rmse
    
    
    data(efc)
    efc$weight <- abs(rnorm(nrow(efc), 1, .3))
    
    # weighted Mann-Whitney-U-test ----
    mw_full <- weighted_mannwhitney(c12hour ~ c161sex + weight, efc)
    
    
    # Bootstrapping
    
    set.seed(1000) # for reproducibility
    
    boot_efc <- efc %>% bootstrap(1000) 
    
    tmp <- boot_efc %>% 
      dplyr::mutate(c12hour = map_dbl(strap, 
                                  ~sjstats:::weighted_mannwhitney.formula(c12hour ~ c161sex + weight, 
                                                                          data = .$data[.$idx, ])$estimate))
    
    boot_p(tmp$c12hour)
    #   term   p.value
    # 1    x 0.0316659
    

    Created on 2022-09-07 by the reprex package (v2.0.1)

    Note that one of the problems comes from the way weighted_mannwhitney() works inside the map() function. When it's called, it calls the default method rather than the formula method and then generates an error. You can call the formula method for the statistic as I did in the code. The other problem is that each strap element of boot_efc isn't a single data frame, it has two elements data and idx where the idx element are the bootstrapped observation numbers. So, you need to use .$data[.$idx, ] as the bootstrapped data. The rest follows along from the example you posted.