Search code examples
pythonalgorithmpython-itertools

Improve itertools.pairwise() function


I am trying to create a custom function that improves the itertools pairwise function.

Unlike the pairwise function, which returns pairs of items (n=2), I would like to be able to specify the grouping length (n=x). So that I can return groups of items of a specified length. Is this possible?

So when n=2, the groupwise function should be equivalent to the itertools pairwise function. But when n=3, I would expect it to return groups of 3.

This is what I have so far - which is essentially the same as the pairwise function. ie. works for n=2.

Code:

from itertools import tee

my_string = 'abcdef'

def groupwise(iterable, n):
    a = tee(iterable, n)
    next(a[n-1], None)
    return zip(a[0], a[1])

for item in groupwise(my_string, 2):
    print("".join(item))

Output:

ab
bc
cd
de
ef

I am trying to modify the code above to accept n=3, n=4, n=5, ... n=x, but I cannot think of a way to provide the undetermined number of components to the zip function considering that I would like to specify any value of n <= len(my_string)



When n=3 the output should be:

abc
bcd
cde
def

Solution

  • tee isn't going to scale in this way, unfortunately. Writing pairwise takes one tee call, and if you want to do it for N elements in each group, you'll need N-1 tee calls.

    Fortunately, we can do better by just rolling the loop ourselves. What we're going to do is keep track of the last N elements that we've seen ourselves and yield them whenever we have a group that's big enough. To do that, we need a data structure that can efficiently add things to the right and subtract them from the left. collections.deque will do nicely. In fact, it even has a maxlen parameter that will automatically subtract from the left exactly when we need it to.

    
    from collections import deque
    
    def groupwise(iterable, n):
        accum = deque((), n)
        for element in iterable:
            accum.append(element)
            if len(accum) == n:
                yield tuple(accum)
    

    Construct an empty deque (the first constructor argument is the initial elements: an empty tuple) whose capacity is n. When we have more than n elements added to the right side, it will automatically drop the leftmost one. We could do this by hand, but deque will do it for us so we might as well utilize the functionality they gave us.

    Then iterate over the iterable. Append each element to the end of the deque (dropping from the left as needed to satisfy the maximum length requirement). Then, if the deque has n elements, we yield it. That makes our function a generator function, so it will produce a new iterator consisting of all of the yielded values.

    We make a shallow copy of our deque so we're not just yielding the same (mutable) deque object each time. We also make that shallow copy a tuple, rather than a deque, to be more in line with other itertools functionality.

    To call, use

    print(list(groupwise(range(10), 3)))
    # Output: [(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9)]