Search code examples
pythonpython-3.xyield-keyword

Generator that is based on another generator


My task is actually quite simple, but I cannot figure out how to achieve it. I am intending to use this in my ML algo, but let's simplify the example. Suppose there is a generator like the following:

nums = ((i+1) for i in range(4))

The above, will yield us 1, 2, 3 and 4.

Suppose that the above generator returns individual "samples". I want to write a generator method that will batch them up. Suppose, the batch size is 2. So if this new method is called:

def batch_generator(batch_size):
    do something on nums
    yield batches of size batch_size

And then the output of this batch generator would be: 1 and 2 and then 3 and 4. Tuples/lists does not matter. What matters is to how to return these batches. I found this yield from keyword that was introduced in Python 3.3, but it seems it is not useful in my case.

And obviously, if we had 5 nums instead of 4, and batch_size is 2, we would omit the last yielded value from the first generator.


Solution

  • My own solution for this could be,

    nums = (i+1 for i in range(4))
    
    def giveBatch(gen, numOfItems):
        try:
            return [next(gen) for i in range(numOfItems)]
        except StopIteration:
            pass
    
    giveBatch(nums, 2)
    # [1, 2]
    giveBatch(nums, 2)
    # [3, 4]
    

    Another solution is to use grouper as @Bharel mentioned. I have compared the time it takes to run both of these solutions. There is not much of a difference. I guess it can be neglected.

    from timeit import timeit
    
    def wrapper(func, *args, **kwargs):
        def wrapped():
            return func(*args, **kwargs)
        return wrapped
    
    nums = (i+1 for i in range(1000000))
    
    wrappedGiveBatch = wrapper(giveBatch, nums, 2)
    timeit(wrappedGiveBatch, number=1000000)
    # ~ 0.998439
    
    wrappedGrouper = wrapper(grouper, nums, 2)
    timeit(wrappedGrouper, number=1000000)
    # ~ 0.734342