I have an iterator iterator
and a list of indices indices
(repeats possible) and I want to extract just those elements from my iterator. At the moment I'm doing
indices = sorted(indices)
deltas = [indices[0]] + [indices[i+1] - indices[i] for i in range(len(indices) - 1)]
output = []
for delta in deltas:
for i in range(delta):
datum = next(iterator)
output.append(datum)
Are those two layers of loop necessary? Am I missing a trick with itertools
?
You definitely don't need the double loop as you can do this with a single loop and without creating deltas but the check code becomes more complicated:
it = iter(sorted(indices))
index = next(it)
for i, datum in enumerate(iterator):
if i != index:
continue
output.append(datum)
try:
index = next(it)
except StopIteration:
break
You can also do this in a list comprehension for very low number of indices as you incur overhead for the check (but avoid the sort
):
[datum for i, datum in enumerate(x) if i in indices]
You can reduce the cost of the check by converting indices
to a set
. I would be interested to see performance of sort
over a set
construction (set lookup is O(1)):
indices = set(indices)
[datum for i, datum in enumerate(x) if i in indices]
First and third options are roughly equivalent in timing at just over 900 ms (slight edge to first) to choose 1000 random indices out of 10,000,000 items. The OP's code ran in about 1.2 s.