My task is actually quite simple, but I cannot figure out how to achieve it. I am intending to use this in my ML algo, but let's simplify the example. Suppose there is a generator like the following:
nums = ((i+1) for i in range(4))
The above, will yield us 1
, 2
, 3
and 4
.
Suppose that the above generator returns individual "samples". I want to write a generator method that will batch them up. Suppose, the batch size is 2
. So if this new method is called:
def batch_generator(batch_size):
do something on nums
yield batches of size batch_size
And then the output of this batch generator would be: 1
and 2
and then 3
and 4
. Tuples/lists does not matter. What matters is to how to return these batches. I found this yield from
keyword that was introduced in Python 3.3, but it seems it is not useful in my case.
And obviously, if we had 5
nums instead of 4
, and batch_size
is 2
, we would omit the last yielded value from the first generator.
My own solution for this could be,
nums = (i+1 for i in range(4))
def giveBatch(gen, numOfItems):
try:
return [next(gen) for i in range(numOfItems)]
except StopIteration:
pass
giveBatch(nums, 2)
# [1, 2]
giveBatch(nums, 2)
# [3, 4]
Another solution is to use grouper
as @Bharel mentioned. I have compared the time it takes to run both of these solutions. There is not much of a difference. I guess it can be neglected.
from timeit import timeit
def wrapper(func, *args, **kwargs):
def wrapped():
return func(*args, **kwargs)
return wrapped
nums = (i+1 for i in range(1000000))
wrappedGiveBatch = wrapper(giveBatch, nums, 2)
timeit(wrappedGiveBatch, number=1000000)
# ~ 0.998439
wrappedGrouper = wrapper(grouper, nums, 2)
timeit(wrappedGrouper, number=1000000)
# ~ 0.734342