Search code examples
pythonnumpyrandomvectorizationprobability

Vectorize sampling from a multidimensional array


I have a numpy array of shape D x N x K.

I need a resulting D x N array of random elements out of K classes, where for each index [d, n] there is a different probability vector for the classes, indicated by the third axis.

The numpy documentation for np.random.choice only allows 1D array for probabilities.

Can I vectorize this type of sampling, or do I have to use a for loop as follows:

# input_array of shape (D, N, K)
# output_array of shape (D, N)

for d in range(input_array.shape[0]):
    for n in range(input_array.shape[1]):
        probabilities = input_array[d, n]
        element = np.random.choice(K, p=probabilities)
        output_array[d, n] = element

I would have loved if there is a function such as

output_array = np.random.choice(input_array, K, probability_axis=-1)

Edit: Managed to find a "hand engineered" solution here.


Solution

  • Neither np.random.choice nor np.random.default_rng().choice support broadcasting of probabilities in the way that you intend. However, you can cobble together something that works similarly using np.cumsum:

    sprob = input_array.cumsum(axis=-1, dtype=float)
    sprob /= sprob[:, :, -1:]
    output_array = (np.random.rand(D, N, 1) > sprob).argmin(-1)
    

    Unfortunately, np.searchsorted does not support multi-dimensional lookup either (probably for related reasons).