Search code examples
rsimulationcontingency

How work the p-value simulation in the chisq.test() and fisher.test()


I have contingency tables RxC. How chisq.test and fisher.test functions work when called with arguments simulate.p.value = TRUE, B = 5000 ?

I'm using the code below to validate the association (or independence) in the contingency table DATA:

 chisq.test(DATA, simulate.p.value = TRUE, B = 5000, correct = FALSE)
 fisher.test(DATA,hybrid = TRUE, simulate.p.value = TRUE)

I know that the test can perform Monte Carlo simulations to estimate the p-value of the test, what I want to know is how these simulations are done internally, that is, if the simulations are made to arrive at a normal distribution or another distribution to deliver the p-value at the end of the test?


Solution

  • In chisq.test and fisher.test p-value simulation is carried out in a non-parametric way without prior distribution assumption. Please see the excerpt of source code for chisq.test:

    if (simulate.p.value) {
        setMETH()
        nx <- length(x)
        sm <- matrix(sample.int(nx, B * n, TRUE, prob = p), 
                     nrow = n)
        ss <- apply(sm, 2L, function(x, E, k) {
          sum((table(factor(x, levels = 1L:k)) - E)^2/E)
        }, E = E, k = nx)
        PARAMETER <- NA
        PVAL <- (1 + sum(ss >= almost.1 * STATISTIC))/(B + 1)
    }