Search code examples
rbootstrap-4montecarlo

Bootstrapping/Monte Carlo Simulation in R


I am trying to follow this test:

Stat test

Suppose I have the following data:

set.seed(123)

active_MJO   <-c(6L, 2L, 11L, 20L, 62L, 15L, 2L, 51L, 58L, 100L, 45L, 44L, 49L, 
                86L, 28L, 1L, 1L, 40L, 79L, 99L, 86L, 50L, 9L, 78L, 45L, 100L, 
                77L, 44L, 45L, 93L)

inactive_MJO <-c(83L, 170L, 26L, 66L, 156L, 40L, 29L, 72L, 109L, 169L, 153L, 
               136L, 169L, 133L, 153L, 13L, 24L, 148L, 121L, 80L, 125L, 21L, 
               135L, 155L, 161L, 171L, 124L, 177L, 167L, 162L)

I dont know how to implement the above test in R.

I have tried the following but I am not sure if this is correct.

sig.test <- function (x){
a <- sample(active_MJO)
b <- sample(inactive_MJO)
sum(a > b)
}

runs <- 1000
sim <- sum(replicate(runs,sig.test(dat))+1)/(runs+1)

I think the above is not correct. Where can I put the 950/1000 condition?

Apologies, I am new to bootstrapping/Monte Carlo test.

I'll appreciate any help on this.

Sincerely, Lyndz


Solution

  • First, it's important to note that they are sampling 30 frequency pairs. Since it's bootstrapping, those samples will be with replacement.

    Then they compare the average active to average inactive. This is equivalent to:

    1. comparing the sum of the active against the sum of the inactive from the 30 pairs, or
    2. comparing the sum of the differences within each of the 30 pairs to zero.

    They repeat the process 1000 times then compare the results of the 1000 comparisons to 950.

    The following code performs #2:

    set.seed(123)
    
    active_MJO   <-c(6L, 2L, 11L, 20L, 62L, 15L, 2L, 51L, 58L, 100L, 45L, 44L, 49L, 
                     86L, 28L, 1L, 1L, 40L, 79L, 99L, 86L, 50L, 9L, 78L, 45L, 100L, 
                     77L, 44L, 45L, 93L)
    inactive_MJO <-c(83L, 170L, 26L, 66L, 156L, 40L, 29L, 72L, 109L, 169L, 153L, 
                     136L, 169L, 133L, 153L, 13L, 24L, 148L, 121L, 80L, 125L, 21L, 
                     135L, 155L, 161L, 171L, 124L, 177L, 167L, 162L)
    
    diff_MJO <- active_MJO - inactive_MJO
    sim <- sum(replicate(1e3, sum(sample(diff_MJO, 30, replace = TRUE)) > 0))
    
    > sim
    [1] 0
    
    

    In this case, none of the 1000 replications resulted in an average active_MJO that was greater than the average inactive_MJO. This is unsurprising after plotting the histogram of sums of bootstrapped differences:

    diff_MJO <- replicate(1e5, sum(sample(diff_MJO, 30, replace = TRUE)))
    hist(diff_MJO)
    

    enter image description here