Search code examples
pythongroup-bypython-itertools

Counterintuitive behaviour of `list(itertools.groupy)`


Let me demonstrate. It has bitten me twice and first time I gave up thinking that I can't understand how groupby works. I am using `Python3.6'.

I have a list of elements in x.y format which I want to groupby y.

a = ['1D.5', '2D.5', '3D.5', '1D.10', '2D.10', '3D.10', '1D.20', '2D.20', '3D.20', '1D.100', '2D.100', '3D.100']
groups = list(itertools.groupby(a, key=lambda x: x.split('.')[-1]))
for gname, glist in groups:
    print(list(glist))

This results in following.

[]
[]
[]
['3D.100']

Strange!

However this works.

groups = itertools.groupby(a, key=lambda x: x.split('.')[-1])
for gname, glist in groups:
    print(list(glist))

['1D.5', '2D.5', '3D.5']
['1D.10', '2D.10', '3D.10']
['1D.20', '2D.20', '3D.20']
['1D.100', '2D.100', '3D.100']

The difference being that this time I didn't call list on the itertools.groupby. Definately there is some technical reason behind this behaviour, but with experience I have about python generators, this is very counterintuitive and probably wrong!

Why would calling list on an iterator invalidate its content?

PS: The docuementation of groupby has following two lines in its implementation details.

# [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
# [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D

Am I right in suspecting list(some generator) is not guarenteed to give the same results as "equivalent" list comprehension?


Solution

  • groupby's group iterators are dependent on the prior ones. So the proper way to make a list out of those group iterators is to ensure that you copy each one as it is produced.

    list((g, list(it)) for g, it in itertools.groupby(a, key=func))
    

    The outer list alone will not copy the inner iterators it that can only be accessed once and only in sequence.