Search code examples
pythonpython-3.xpython-itertools

Using itertools groupby to find the group of all even numbers in a list


I am trying to understand the usefulness of the itertools.groupby() function and have created a rather naive use case. I have a list of numbers, and want to group them by oddness or evenness. Below is my code using itertools.groupby():

for decision, group in groupby(range(2, 11), key=lambda x: x % 2 == 0): 
    ...:     print(decision, list(group))

And below is the output I get:

True [2]
False [3]
True [4]
False [5]
True [6]
False [7]
True [8]
False [9]
True [10]

Basically what I was expecting is something like where all the "True's" are grouped together and all the "False's" are grouped together.

Is this even possible with groupby()?


Solution

  • groupby() combines consecutive values where their key output is equal, giving you a shared iterator for each such group. It will not process the whole input in one go, it gives you the groups as you iterate.

    For your example, the key changes for each value, and so the groups consist of a single value each; it starts with 2 and the key output is True, then next comes 3 and the key produces False. Because True != False, that's a new group. The next value 4 changes the key again, to True, so it's another group, etc.

    What you want to do can't be done with groupby(); to sort values into buckets across the whole iterable, just use a dictionary:

    grouped = {}
    for value in range(2, 11):
        key = value % 2 == 0
        grouped.setdefault(key, []).append(value)
    

    You could also sort your input first, with sorted(range(2, 11), key=lambda x: x % 2 == 0), but that's a waste of time. Sorting is a more complex algorithm, taking O(n log n) (quasilinear) time, whereas using a dictionary to group the values takes O(n) (linear) time. That might not matter when you have just 9 elements, but when you have to process 1000 elements, sorting would increase the time taken by a factor of nearly 10, for 1 million elements, a factor of nearly 20, etc.