Search code examples
pythonpython-itertools

itertools.groupby: iterate over groups pairwise


How can I iterate over groupby results in pairs? What I tried isn't quite working:

from itertools import groupby,izip

groups = groupby([(1,2,3),(1,2),(1,2),(3,4,5),(3,4)],key=len)

def grouped(iterable, n):    
    return izip(*[iterable]*n)

for g, gg in grouped(groups,2):
    print list(g[1]), list(gg[1])

Output I get:

[] [(1, 2), (1, 2)]
[] [(3, 4)]

Output I would like to have:

[(1, 2, 3)] [(1, 2), (1, 2)]
[(3, 4, 5)] [(3, 4)]

Solution

  • import itertools as IT
    
    groups = IT.groupby([(1,2,3),(1,2),(1,2),(3,4,5),(3,4)], key=len)
    groups = (list(group) for key, group in groups)
    
    def grouped(iterable, n):
        return IT.izip(*[iterable]*n)
    
    for p1, p2  in grouped(groups, 2):
        print p1, p2
    

    yields

    [(1, 2, 3)] [(1, 2), (1, 2)]
    [(3, 4, 5)] [(3, 4)]
    

    The code you posted is very interesting. It has a mundane problem, and a subtle problem.

    The mundane problem is that itertools.groupby returns an iterator which outputs both a key and a group on each iteration. Since you are interested in only the groups, not the keys, you need something like

    groups = (group for key, group in groups)
    

    The subtle problem is more difficult to explain -- I'm not really sure I understand it fully. Here is my guess: The iterator returned by groupby has turned its input,

    [(1,2,3),(1,2),(1,2),(3,4,5),(3,4)]
    

    into an iterator. That the groupby iterator is wrapped around the underlying data iterator is analogous to how a csv.reader is wrapped around an underlying file object iterator. You get one pass through this iterator and one pass only. The itertools.izip function, in the process of pairing items in groups, causes the groups iterator to advance from the first item to the second. Since you only get one pass through the iterator, the first item has been consumed, so when you call list(g[1]) it is empty.

    A not-so-satisfying fix to this problem is to convert the iterators in groups into lists:

    groups = (list(group) for key, group in groups)
    

    so itertools.izip will not prematurely consume them. Edit: On second thought, this fix is not so bad. groups remains an iterator, and only turns the group into a list as it is consumed.