I am looking for the efficient function in Python that does sample selection without replacement, but by actually mutating the original list. That is, alternative to this:
random.sample(population, k)
that removes the elements from the original list as the sample is selected. List can be millions items large, and there could be potentially tens of subsequent calls to the sample function.
Ideally, I would like to do something like:
sample_size_1 = 5
sample_size_2 = 200
sample_size_3 = 100
population = range(10000000)
sample_1 = select_sample(population, sample_size_1) #population is shrunk
sample_2 = select_sample(population, sample_size_2) #population is shrunk again
sample_3 = select_sample(population, sample_size_3) #and population is shrunk again
where population
is efficiently shrunk between each call to select_sample.
I have some code that I could show here, but I am looking for hopefully something already available, or more "pythonic" than my while loops.
A simple way would be to shuffle your population so the initial ordering is random (if it's not already random). Then take elements from the end and remove them.
You can get the elements by slicing population[-sample_size:]
and remove them using population[-sample_size:] = []
.
import random
population = list(range(100))
# Shuffle population so the ordering is random.
random.shuffle(population)
for sample_size in [1, 5, 10]:
sample = population[-sample_size:]
population[-sample_size:] = []
print(sample)
# [79]
# [66, 89, 81, 0, 38]
# [18, 39, 90, 36, 11, 32, 63, 65, 72, 67]
You could also use population.pop()
if you just wanted to remove a single element at a time (i.e. if sample_size
was 1).
A function to do this would then simply be (assuming that your population is already shuffled):
def select_sample(pop, size):
x = pop[-size:]
pop[-size:] = []
return x