Assumed: When using the python itertools.tee()
, all duplicate iterators refer to the original iterator, and the original is cached to improve performance.
My main concern in the following inquiry is regarding MY IDEA of intended/proper caching behavior.
Edit: my idea of proper caching was based on flawed functional assumptions. Will ultimately need a little wrapper around tee (which will probably have consequences regarding caching).
Lets say I create 3 iterator clones using tee: a, b, c = itertools.tee(myiter,3)
. Also assume that at this point, I remove all references to the original, myiter
(meaning, there is no great way for my code to refer back to the original hereafter).
At some later point in code, if I decided that I want another clone of myiter
, can I just re-tee() one of my duplicates? (with proper caching back to the originally cached myiter
)
In other words, at some later point, I wish I had instead used this:
a, b, c, d = itertools.tee(myiter,4)
.
But, since I have discarded all references to the original myiter
, the best I can muster would be:
copytee = itertools.tee(a, 1) #where 'a' is from a previous tee()
Does tee() know what I want here? (that I REALLY want to create a clone based on the original myiter
, NOT the intermediate clone a
(which may be partially consumed))
There's nothing magical about tee
. It's just clever ;-) At any point, tee
clones the iterator passed to it. That means the cloned iterator(s) will yield the values produced by the passed-in iterator from this point on. But it's impossible for them to reproduce values that were produced before tee
was invoked.
Let's show it with something much simpler than your example:
>>> it = iter(range(5))
>>> next(it)
0
0 is gone now - forever. tee()
can't get it back:
>>> a, b = tee(it)
>>> next(a)
1
So a
pushed it
to produce its next value. It's that value that gets cached, so that other clones can reproduce it too:
>>> next(b)
1
To get that result, it
wasn't touched - 1 was retrieved from the internal cache. And now that all of it
, a
and b
have produced 1, 1 is gone forever too.
I don't know whether that answers your question - answering "Does tee() know what I want here?" seems to require telepathy ;-) That is, I don't know what you mean by "with proper caching". It would be most helpful if you gave an exact example of the input/output behavior you're hoping for.
Short of that, the Python docs give Python code that's equivalent to tee(), and perhaps studying that would answer your question:
def tee(iterable, n=2):
it = iter(iterable)
deques = [collections.deque() for i in range(n)]
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
newval = next(it) # fetch a new value and
for d in deques: # load it to all the deques
d.append(newval)
yield mydeque.popleft()
return tuple(gen(d) for d in deques)
You can see from that, for example, that nothing about an iterator's internal state is cached - all that's cached is the values produced by the passed-in iterator, starting from the time tee()
is called. Each clone has its own deque
(FIFO list) of the passed-in iterator's values produced so far, and that's all the clones know about the passed-in iterator. So it may be too simple for whatever you're really hoping for.