I was playing around to get a better feeling for itertools groupby
, so I grouped a list of tuples by the number and tried to get a list of the resulting groups. When I convert the result of groupby
to a list however, I get a strange result: all but the last group are empty. Why is that? I assumed turning an iterator into a list would be less efficient but never change behavior. I guess the lists are empty because the inner iterators are traversed but when/where does that happen?
import itertools
l=list(zip([1,2,2,3,3,3],['a','b','c','d','e','f']))
#[(1, 'a'), (2, 'b'), (2, 'c'), (3, 'd'), (3, 'e'), (3, 'f')]
grouped_l = list(itertools.groupby(l, key=lambda x:x[0]))
#[(1, <itertools._grouper at ...>), (2, <itertools._grouper at ...>), (3, <itertools._grouper at ...>)]
[list(x[1]) for x in grouped_l]
[[], [], [(3, 'f')]]
grouped_i = itertools.groupby(l, key=lambda x:x[0])
#<itertools.groupby at ...>
[list(x[1]) for x in grouped_i]
[[(1, 'a')], [(2, 'b'), (2, 'c')], [(3, 'd'), (3, 'e'), (3, 'f')]]
From the itertools.groupby()
documentation:
The returned group is itself an iterator that shares the underlying iterable with
groupby()
. Because the source is shared, when thegroupby()
object is advanced, the previous group is no longer visible.
Turning the output from groupby()
into a list advances the groupby()
object.
Hence, you shouldn't be type-casting itertools.groupby
object to list. If you want to store the values as list
, then you should be doing something like this list comprehension in order to create copy of groupby
object:
grouped_l = [(a, list(b)) for a, b in itertools.groupby(l, key=lambda x:x[0])]
This will allow you to iterate your list (transformed from groupby
object) multiple times. However, if you are interested in only iterating the result once, then the second solution you mentioned in the question will suffice your requirement.