I'm looking for a way to generate an iterator that takes an iterable and just passes through the values until a sentinel value appears twice in direct succession. Similar to iter(a.__next__, sentinel)
only that sentinel must occur twice.
The following rather uninspired code does the trick, but surely there must be a less verbose solution?
So to put it in a concrete question:
Is there a way to avoid the fully-fledged generator and achieve the same using perhaps itertools
or a generator expression?
>>> def repeat_offenders(a, sentinel):
... ia = iter(a)
... for x in ia:
... if x==sentinel:
... try:
... y = next(ia)
... except StopIteration:
... yield x
... raise
... if y==sentinel:
... raise StopIteration
... yield x
... yield y
... else:
... yield x
Here are two examples:
>>> ''.join(repeat_offenders('ABCABCAABBCC', 'B'))
'ABCABCAA'
>>> ''.join(repeat_offenders('ABABAB', 'B'))
'ABABAB'
Note this question is similar but is lacking the generator angle.
You could define repeat_offenders
in terms of iwindow
, a sliding window recipe (which can work on any iterable, not just sequences), and the usual iter(callable, sentinel)
idiom:
import itertools as IT
def iwindow(iterable, n=2):
"""
Returns a sliding window (of width n) over data from the iterable.
s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ..., (sk, None, ..., None)
"""
iterables = IT.tee(iterable, n)
iterables = (IT.islice(it, pos, None) for pos, it in enumerate(iterables))
yield from IT.zip_longest(*iterables)
def repeat_offenders(iterable, sentinel, repeat=2):
return (item[0] for item in iter(iwindow(iterable, repeat).__next__,
(sentinel,)*repeat))
print(''.join(repeat_offenders('ABCABCAABBCC', 'B', 2)))
# ABCABCAA
print(''.join(repeat_offenders('ABABAB', 'B', 2)))
# ABABAB
iwindow
is a generalization of the pairwise
recipe shown in the itertools docs. By writing repeat_offenders
in terms of iwindow
, we can generalize the concept to stopping after n
repeats practically for free.