Search code examples
pythonalgorithmsampling

Selection without replacement - by mutating the list


I am looking for the efficient function in Python that does sample selection without replacement, but by actually mutating the original list. That is, alternative to this:

random.sample(population, k)

that removes the elements from the original list as the sample is selected. List can be millions items large, and there could be potentially tens of subsequent calls to the sample function.

Ideally, I would like to do something like:

sample_size_1 = 5   
sample_size_2 = 200   
sample_size_3 = 100   
population = range(10000000)  

sample_1 = select_sample(population, sample_size_1)  #population is shrunk  
sample_2 = select_sample(population, sample_size_2)  #population is shrunk again     
sample_3 = select_sample(population, sample_size_3)  #and population is shrunk again

where population is efficiently shrunk between each call to select_sample.

I have some code that I could show here, but I am looking for hopefully something already available, or more "pythonic" than my while loops.


Solution

  • A simple way would be to shuffle your population so the initial ordering is random (if it's not already random). Then take elements from the end and remove them.

    You can get the elements by slicing population[-sample_size:] and remove them using population[-sample_size:] = [].

    import random
    
    population = list(range(100))
    
    # Shuffle population so the ordering is random.
    random.shuffle(population)
    
    for sample_size in [1, 5, 10]:
        sample = population[-sample_size:]
        population[-sample_size:] = []
        print(sample)
        # [79]
        # [66, 89, 81, 0, 38]
        # [18, 39, 90, 36, 11, 32, 63, 65, 72, 67]
    

    You could also use population.pop() if you just wanted to remove a single element at a time (i.e. if sample_size was 1).

    A function to do this would then simply be (assuming that your population is already shuffled):

    def select_sample(pop, size):
        x = pop[-size:]
        pop[-size:] = []
        return x