Let me demonstrate. It has bitten me twice and first time I gave up thinking that I can't understand how groupby
works. I am using `Python3.6'.
I have a list of elements in x.y
format which I want to groupby y
.
a = ['1D.5', '2D.5', '3D.5', '1D.10', '2D.10', '3D.10', '1D.20', '2D.20', '3D.20', '1D.100', '2D.100', '3D.100']
groups = list(itertools.groupby(a, key=lambda x: x.split('.')[-1]))
for gname, glist in groups:
print(list(glist))
This results in following.
[]
[]
[]
['3D.100']
Strange!
However this works.
groups = itertools.groupby(a, key=lambda x: x.split('.')[-1])
for gname, glist in groups:
print(list(glist))
['1D.5', '2D.5', '3D.5']
['1D.10', '2D.10', '3D.10']
['1D.20', '2D.20', '3D.20']
['1D.100', '2D.100', '3D.100']
The difference being that this time I didn't call list
on the itertools.groupby
. Definately there is some technical reason behind this behaviour, but with experience I have about python generators, this is very counterintuitive and probably wrong!
Why would calling list
on an iterator invalidate its content?
PS:
The docuementation of groupby
has following two lines in its implementation details.
# [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
# [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
Am I right in suspecting list(some generator)
is not guarenteed to give the same results as "equivalent" list comprehension?
groupby
's group iterators are dependent on the prior ones. So the proper way to make a list out of those group iterators is to ensure that you copy each one as it is produced.
list((g, list(it)) for g, it in itertools.groupby(a, key=func))
The outer list
alone will not copy the inner iterators it
that can only be accessed once and only in sequence.