Now I plan to learn more about yield
in python. And I found some codes about yield, which implemented the algorithm Reservoir Sampling
as following:
def RandomSelect(knum, rand=None):
''' (int, func) -> list
Reservoir Sampling implementation
'''
selection = None
k_elems_list = []
count = 0
if rand is None:
rand = Random()
while True:
item = yield selection
if len(k_elems_list) < knum:
k_elems_list.append(item)
else:
# Randomly replace elements in the reservoir with a decreasing probability
r = rand.randint(0, count)
if r < knum:
k_elems_list[r] = item
count += 1
print k_elems_list
In order to break the while
loop, I just add some codes after item = yield selection
if item == -1: # reach to the end of data, just break
break
Question 1, Is there any better way to break out the while loop?
To call the function RandomSelect
,
myList = [1,2,3,4,5,6,7,8,-1]
cr = RandomSelect(3);
cr.next() # advance to the yield statement, otherwise I can't call send
try:
for val in myList:
cr.send(val)
except StopIteration:
pass
finally:
del cr
I have to catch the StopIteration
exception explicitly.
Question 2, is there any better way to swallow the StopIteration
in the codes?
I think a slightly cleaner way to accomplish what is being done — which addresses both your questions — would be to explicitly close the generator by calling itsclose()
method to terminate it and break out of the loop. Doing so also means aStopIteration
doesn't need to be "swallowed". Another benefit is it's no longer necessary to add the -1
sentinel value at the end of the list.
def RandomSelect(knum, rand=None):
''' (int, func) -> list
Reservoir Sampling implementation
'''
selection = None
k_elems_list = []
count = 0
if rand is None:
rand = Random()
while True:
try:
item = yield selection
except GeneratorExit:
break
if len(k_elems_list) < knum:
k_elems_list.append(item)
else:
# Randomly replace elements in the reservoir with a decreasing probability
r = rand.randint(0, count)
if r < knum:
k_elems_list[r] = item
count += 1
print k_elems_list
myList = [1,2,3,4,5,6,7,8]
cr = RandomSelect(3)
cr.next() # advance to the yield statement, otherwise I can't call send
for val in myList:
cr.send(val)
cr.close()
del cr
A minor additional enhancement (about something you didn't ask about) would be to make it so it wasn't necessary to manually advance to theyield
statement before callingsend()
. A good way to accomplish that would be with a decorator function similar to the one namedconsumer()
David Beazley described in his Generator Tricks
For Systems Programmers presentation at PyCon 2008:
def coroutine(func):
""" Decorator that takes care of starting a coroutine automatically. """
def start(*args, **kwargs):
cr = func(*args, **kwargs)
cr.next()
return cr
return start
@coroutine
def RandomSelect(knum, rand=None):
.
.
.
print k_elems_list
myList = [1,2,3,4,5,6,7,8]
cr = RandomSelect(3)
#cr.next() # NO LONGER NECESSARY
for val in myList:
cr.send(val)
cr.close()
del cr