Search code examples
pythonlistpython-itertools

Python itertools groupby not grouping as I expect


Suppose I have a string like so:

st='''Line 1
Line 2
Line 3
Line 4

Line 5
Line 6

Line 7
Line 8 
Line 9

Line 10
Line 11
Line 12
Line 13
Line 14'''
# may be really big...

Now suppose I want a LoL grouped by the blank lines:

[['Line 1', 'Line 2', 'Line 3', 'Line 4'],
 ['Line 5', 'Line 6'],
 ['Line 7', 'Line 8 ', 'Line 9'],
 ['Line 10', 'Line 11', 'Line 12', 'Line 13', 'Line 14']]

I know that I can create that LoL with a regex split:

[[x] for x in re.split(r'^\s*\n',st,flags=re.MULTILINE)]

However, I am trying to create this with a non-regex Python generator. The closest I have gotten is this horrible thing (which includes the blanks and is not at all efficient I know...):

result=[]        
for sub in (group for key, group in itertools.groupby(st.splitlines(), lambda x: not x.rstrip())):
    result.append(list(sub))

print result

Any hints on a direction to go?

I am somewhat keying off THIS SO question.


Solution

  • I'd probably write

    >>> grouped = itertools.groupby(map(str.strip, st.splitlines()), bool)
    >>> [list(g) for k,g in grouped if k]
    [['Line 1', 'Line 2', 'Line 3', 'Line 4'], ['Line 5', 'Line 6'], 
    ['Line 7', 'Line 8', 'Line 9'], ['Line 10', 'Line 11', 'Line 12', 'Line 13', 'Line 14']]
    

    This will also handle blank lines with whitespace, which \n\n-based splitting won't. On the other hand, it doesn't preserve leading and trailing whitespace, which from the 'Line 8 ' example you may want. If that matters, you could do:

    grouped = itertools.groupby(st.splitlines(), lambda x: bool(x.strip()))
    

    (which, looking at it, is pretty close to what you're already doing.)