Search code examples
pythonnumpyrandomvectorizationsample

Efficiently draw random samples without replacement from an array in python


I need to draw random samples without replacement from a 1D NumPy array. However, performance is critical since this operation will be repeated many times.

Here’s the code I’m currently using:

import numpy as np

# Example array
array = np.array([10, 20, 30, 40, 50])

# Number of samples to draw
num_samples = 3

# Draw samples without replacement
samples = np.random.choice(array, size=num_samples, replace=False)

print("Samples:", samples)

While this works for one sample, it requires a loop to generate multiple samples, and I believe there could be a way to optimize or vectorize this operation to improve performance when sampling multiple times.

  • Is there a way to vectorize or otherwise optimize this operation?
  • Would another library (e.g., TensorFlow, PyTorch) provide better performance for this task?
  • Are there specific techniques for bulk sampling that avoid looping in Python?

Solution

  • The code below generates random samples of a list without replacement in a vectorized manner. This solution is particularly useful when the number of simulations is large and the number of samples per simulation is low.

    import numpy as np
    
    def draw_random_samples(len_deck, n_simulations, n_cards):
        
        """
        Draw random samples from the deck.
        
        Parameters
        ----------
        len_deck : int
            Length of the deck.
        n_simulations : int
            How many combinations of cards are generated. (Doubles could occur.)
        n_cards : int
            How many cards to draw from the deck per simulation. (All cards are unique.)
        
        Returns
        -------
        indices : array-like
            Random indices of the deck. 
        
        """
        
        indices = np.random.randint(0, len_deck, (1, n_simulations))
    
        for i in range(1, n_cards):
            new_indices = np.random.randint(0, len_deck-i, n_simulations)
            new_indices += np.sum(new_indices >= indices - np.arange(i)[:,None], axis=0)
    
            indices = np.vstack((indices, new_indices))
            indices = np.sort(indices, axis=0)
        
        return indices.T