Search code examples
pythonpython-2.7iteratorpython-itertools

itertools.groupby returning wrong result (this is not about sorting)


I wanted to break a string into words, but keeping the index where the word started. E.g., I want to transform 'aaa bbb ccc' into [(0, 'aaa'), (4, 'bbb'), (8, 'ccc')]. This is just the background, not the question.

The problem is that I tried to use itertools.groupby with str.isalpha as key, but it's giving me weird results.

This code shows what I'm talking about (please ignore the list everywhere. I just wanted to be sure I was dealing with iterables, not iterators):

from itertools import groupby

text = 'aaa bbb ccc'

chars = list(groupby(list(enumerate(text)), lambda x: x[1].isalpha()))

result = [list(v) for k, v in chars if k] 

print result
assert result == [
        [(0, 'a'), (1, 'a'), (2, 'a')],
        [(4, 'b'), (5, 'b'), (6, 'b')],
        [(8, 'c'), (9, 'c'), (10, 'c')]]

The variable result is ending up as [[(10, 'c')], [], []] and I don't know why. Maybe I'm missing something really simple here, but I just can't see it.


Solution

  • Correct the code:

    chars = groupby(l, lambda x: x[1].isalpha())
    result = [list(v) for k, v in chars if k]
    

    To figure out the weird output

    >>> l = list(enumerate(text))
    
    >>> chars = groupby(l, lambda x: x[1].isalpha())
    
    >>> list(chars.next()[1])
    [(0, 'a'), (1, 'a'), (2, 'a')]
    
    >>> for k,v in list(chars): print list(v)
    []
    [(10, 'c')]
    []
    []
    

    list would take effect on the sub-iterator in groupby