Search code examples
pythoniteratorgeneratorpython-itertoolsgenerator-expression

Concise way of stopping iteration when sentinel occurs twice in a row


I'm looking for a way to generate an iterator that takes an iterable and just passes through the values until a sentinel value appears twice in direct succession. Similar to iter(a.__next__, sentinel) only that sentinel must occur twice.

The following rather uninspired code does the trick, but surely there must be a less verbose solution?

So to put it in a concrete question:

Is there a way to avoid the fully-fledged generator and achieve the same using perhaps itertools or a generator expression?

>>> def repeat_offenders(a, sentinel):
...    ia = iter(a)
...    for x in ia:
...       if x==sentinel:
...          try:
...             y = next(ia)
...          except StopIteration:
...             yield x
...             raise
...          if y==sentinel:
...             raise StopIteration
...          yield x
...          yield y
...       else:
...          yield x

Here are two examples:

>>> ''.join(repeat_offenders('ABCABCAABBCC', 'B'))
'ABCABCAA'
>>> ''.join(repeat_offenders('ABABAB', 'B'))
'ABABAB'

Note this question is similar but is lacking the generator angle.


Solution

  • You could define repeat_offenders in terms of iwindow, a sliding window recipe (which can work on any iterable, not just sequences), and the usual iter(callable, sentinel) idiom:

    import itertools as IT
    
    def iwindow(iterable, n=2):
        """
        Returns a sliding window (of width n) over data from the iterable.
        s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ..., (sk, None, ..., None)
        """
        iterables = IT.tee(iterable, n)
        iterables = (IT.islice(it, pos, None) for pos, it in enumerate(iterables))
        yield from IT.zip_longest(*iterables)
    
    def repeat_offenders(iterable, sentinel, repeat=2):
        return (item[0] for item in iter(iwindow(iterable, repeat).__next__, 
                                         (sentinel,)*repeat))
    
    print(''.join(repeat_offenders('ABCABCAABBCC', 'B', 2)))
    # ABCABCAA
    
    print(''.join(repeat_offenders('ABABAB', 'B', 2)))
    # ABABAB
    

    iwindow is a generalization of the pairwise recipe shown in the itertools docs. By writing repeat_offenders in terms of iwindow, we can generalize the concept to stopping after n repeats practically for free.