Search code examples
algorithmoptimizationstatisticsmontecarlo

Efficient way to generate random contingency tables?


What is an efficient way to generate a random contingency table? A contingency table is defined as a rectangular matrix such that the sum of each row is fixed, and the sum of each column is fixed, but the individual elements may be anything as long as the sum of each row and column is correct.

Note that it's very easy to generate random contingency tables, but I'm looking for something more efficient than the naive algorithm.


Solution

  • Looking at the code of the networksis package for R might be helpful. I believe that efficient computation requires fancy Markov Chain sequential importance resampling techniques, so you might want to avoid reimplementing this if you can avoid it.

    Edit: The relevant paper is Chen, Diaconis, Holmes, and Liu (2005). In the words of the authors, "[o]ur method compares favorably with other existing Monte Carlo- based algorithms, and sometimes is a few orders of magnitude more efficient."