In the python documentation for itertools it provides the following "recipe" for advancing an iterator n steps:
def consume(iterator, n):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
I'm wondering why this recipe is fundamentally different from something like this (aside from the handling of consuming the whole iterator):
def other_consume(iterable, n):
for i in xrange(n):
next(iterable, None)
I used timeit
to confirm that, as expected, the above approach is much slower. What's going on in the recipe that allows for this superior performance? I get that it uses islice
, but looking at islice
, it APPEARS to be doing fundamentally the same thing as the code above:
def islice(iterable, *args):
s = slice(*args)
it = iter(xrange(s.start or 0, s.stop or sys.maxint, s.step or 1))
nexti = next(it)
### it seems as if this loop yields from the iterable n times via enumerate
### how is this different from calling next n times?
for i, element in enumerate(iterable):
if i == nexti:
yield element
nexti = next(it)
Note: even if instead of importing islice
from itertools
I define it using the python equivalent from the docs shown above, the recipe is still faster..
EDIT: timeit
code here:
timeit.timeit('a = iter([random() for i in xrange(1000000)]); consume(a, 1000000)', setup="from __main__ import consume,random", number=10)
timeit.timeit('a = iter([random() for i in xrange(1000000)]); other_consume(a, 1000000)', setup="from __main__ import other_consume,random", number=10)
other_consume
is ~ 2.5x slower each time I run this
The documentation on itertools.islice()
is flawed and doesn't handle the edgecase for start == stop
properly. It is exactly that edgecase that consume()
uses.
For islice(it, n, n)
, exactly n
elements are consumed from it
but nothing is ever yielded. Instead, StopIteration
is raised after those n
elements have been consumed.
The Python version you used to test with on the other hand raises StopIteration
immediately without ever consuming anything from it
. This makes any timings against this pure-python version incorrect and way too fast.
This is because the xrange(n, n, 1)
iterator immediately raises StopIteration
:
>>> it = iter(xrange(1, 1))
>>> print next(it)
Traceback (most recent call last):
File "prog.py", line 4, in <module>
print next(it)
StopIteration