Suppose I have a string like so:
st='''Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
Line 11
Line 12
Line 13
Line 14'''
# may be really big...
Now suppose I want a LoL grouped by the blank lines:
[['Line 1', 'Line 2', 'Line 3', 'Line 4'],
['Line 5', 'Line 6'],
['Line 7', 'Line 8 ', 'Line 9'],
['Line 10', 'Line 11', 'Line 12', 'Line 13', 'Line 14']]
I know that I can create that LoL with a regex split:
[[x] for x in re.split(r'^\s*\n',st,flags=re.MULTILINE)]
However, I am trying to create this with a non-regex Python generator. The closest I have gotten is this horrible thing (which includes the blanks and is not at all efficient I know...):
result=[]
for sub in (group for key, group in itertools.groupby(st.splitlines(), lambda x: not x.rstrip())):
result.append(list(sub))
print result
Any hints on a direction to go?
I am somewhat keying off THIS SO question.
I'd probably write
>>> grouped = itertools.groupby(map(str.strip, st.splitlines()), bool)
>>> [list(g) for k,g in grouped if k]
[['Line 1', 'Line 2', 'Line 3', 'Line 4'], ['Line 5', 'Line 6'],
['Line 7', 'Line 8', 'Line 9'], ['Line 10', 'Line 11', 'Line 12', 'Line 13', 'Line 14']]
This will also handle blank lines with whitespace, which \n\n
-based splitting won't. On the other hand, it doesn't preserve leading and trailing whitespace, which from the 'Line 8 '
example you may want. If that matters, you could do:
grouped = itertools.groupby(st.splitlines(), lambda x: bool(x.strip()))
(which, looking at it, is pretty close to what you're already doing.)