Search code examples
pythonnumpyrandommontecarlo

How to sample from multiple categorical distributions using Python


Let P be an array where each row sums up to 1. How can I generate a matrix A where

  • A has the same dimensions as P, and has A_{ij} equal to 1 with probability P_{ij}

  • A has exactly one entry equal to 1 in every row, with all other entries zero

How can I do this in Numpy or Scipy?

I can do it using for-loops, but that's obviously slow. Is there a way to use Numpy to make it efficient? Or Numba?


Solution

  • This follows Wikipedia.

    import numpy.random as rnd
    import numpy as np
    
    A_as_numbers = np.argmax(np.log(P) + rnd.gumbel(size=P.shape), axis=1)
    A_one_hot = np.eye(P.shape[1])[A_as_numbers].reshape(P.shape)
    

    Tested it on:

    P = np.matrix([[1/4, 1/4, 1/4, 1/4], [1/3,1/3,1/6,1/6]])
    

    Got:

    array([[ 1.,  0.,  0.,  0.],
           [ 0.,  1.,  0.,  0.]])