Search code examples
pythoniterator

Iterate an iterator by chunks (of n) in Python?


Can you think of a nice way (maybe with itertools) to split an iterator into chunks of given size?

Therefore l=[1,2,3,4,5,6,7] with chunks(l,3) becomes an iterator [1,2,3], [4,5,6], [7]

I can think of a small program to do that but not a nice way with maybe itertools.


Solution

  • The grouper() recipe from the itertools documentation's recipes comes close to what you want:

    def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
        "Collect data into non-overlapping fixed-length chunks or blocks"
        # grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
        # grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
        # grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
        args = [iter(iterable)] * n
        if incomplete == 'fill':
            return zip_longest(*args, fillvalue=fillvalue)
        if incomplete == 'strict':
            return zip(*args, strict=True)
        if incomplete == 'ignore':
            return zip(*args)
        else:
            raise ValueError('Expected fill, strict, or ignore')
    

    This won't work well when the last chunk is incomplete though, as, depending on the incomplete mode, it will either fill up the last chunk with a fill value, raise an exception, or silently drop the incomplete chunk.

    In more recent versions of the recipes they added the batched recipe that does exactly what you want:

    def batched(iterable, n):
        "Batch data into tuples of length n. The last batch may be shorter."
        # batched('ABCDEFG', 3) --> ABC DEF G
        if n < 1:
            raise ValueError('n must be at least one')
        it = iter(iterable)
        while (batch := tuple(islice(it, n))):
            yield batch
    

    Finally, a less general solution that only works on sequences but does handle the last chunk as desired and preserves the type of the original sequence is:

    (my_list[i:i + chunk_size] for i in range(0, len(my_list), chunk_size))
    

    Since python 3.12, you can also just use itertools.batched. From docs:

    itertools.batched(iterable, n)

    Batch data from the iterable into tuples of length n. The last batch may be shorter than n.