Search code examples
pythongeneratorone-liner

iterator yielding n-tuples from an iterator as oneliner expression


What I'm looking for is a oneliner-variant of the function batched(iterable, n) described in the code section of Itertools Recipes that will batch data into tuples of a certain length.

Assume the source to be an iterator of arbitrary length, e.g. an iteration over sys.stdin which receives strings in my usecase.

In the end, I would like to have a generator which yields tuples of a certain length with the last tuple also potentionally being shorter (depending on the total number of items).

AFAIK, batched(iterable, n) will be implemented in Python 3.12 which is due to be released later this year, yet I would like to learn how a oneliner-solution could look like with the current release.

This is what I've come up with so far (for an example tuple-length of 2):

from itertools import islice, zip_longest

foo=('aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff', 'ggg',) ## for simulating sys.stdin

# a oneliner that gets close but would fill lacking elements with None, so
# list(slicepairs0) would be [('aaa', 'bbb'), ('ccc', 'ddd'), ('eee', 'fff'), ('ggg', None)]
slicepairs0 = zip_longest(*[iter(foo)]*2)

# a oneliner that gets close but ignores possibly remaining elements, so
# list(slicepairs1) would be [('aaa', 'bbb'), ('ccc', 'ddd'), ('eee', 'fff')]
slicepairs1 = zip(*[iter(foo)]*2)

# a function similar to how batched() is currently implemented
def giveslicepair(foo):
  fooi=iter(foo)
  while nextslice := tuple(islice(fooi,2)):
    yield nextslice

# this iterator does what it should but relies on the generator-funtion giveslicepair(), so
# list(slicepairs2) would be [('aaa', 'bbb'), ('ccc', 'ddd'), ('eee', 'fff'), ('ggg',)]
slicepairs2 = ( item for item in giveslicepair(foo) )

I tried around to embody the functionality of giveslicepair() into the iterator expression on the last line but couldn't get it working. Feels like I'm overlooking something obvious here and would be thankful for hints how to do this in a performant and pythonic way.

Sidenote: in real world application, the size of the tuples is expected to typically be something around 50 to 400 instead of only 2. The number of lines being fed may vary greatly and could be anything from 1 to billions.

EDIT

For the sake of completeness and based on the accepted answer, the last line I was looking for (albeit not making use of a comprehension construct which I thought would be the way to go) could be written as:

# list(slicepairs3) would also be [('aaa', 'bbb'), ('ccc', 'ddd'), ('eee', 'fff'), ('ggg',)]
# but without the need for calling giveslicepair(foo)
slicepairs3 = iter(lambda it=iter(foo): tuple(islice(it, 2)), tuple())

Solution

  • If you want one liner you can use iter() with lambda argument:

    from itertools import islice
    
    def one_line_batched(iterable, n):
        return iter(lambda it=iter(iterable): tuple(islice(it, n)), tuple())
    
    print(list(one_line_batched('ABCDEFG', 3)))
    

    Prints:

    [('A', 'B', 'C'), ('D', 'E', 'F'), ('G',)]