Search code examples
pythoniteratorpython-itertools

Index into a Python iterator


I have an iterator iterator and a list of indices indices (repeats possible) and I want to extract just those elements from my iterator. At the moment I'm doing

indices = sorted(indices)
deltas = [indices[0]] + [indices[i+1] - indices[i] for i in range(len(indices) - 1)]
output = []
for delta in deltas:
    for i in range(delta):
        datum = next(iterator)
    output.append(datum)

Are those two layers of loop necessary? Am I missing a trick with itertools?


Solution

  • You definitely don't need the double loop as you can do this with a single loop and without creating deltas but the check code becomes more complicated:

    it = iter(sorted(indices))
    index = next(it)
    for i, datum in enumerate(iterator):
        if i != index:
            continue
        output.append(datum)
        try:
            index = next(it)
        except StopIteration:
            break
    

    You can also do this in a list comprehension for very low number of indices as you incur overhead for the check (but avoid the sort):

    [datum for i, datum in enumerate(x) if i in indices]
    

    You can reduce the cost of the check by converting indices to a set. I would be interested to see performance of sort over a set construction (set lookup is O(1)):

    indices = set(indices)
    [datum for i, datum in enumerate(x) if i in indices]
    

    First and third options are roughly equivalent in timing at just over 900 ms (slight edge to first) to choose 1000 random indices out of 10,000,000 items. The OP's code ran in about 1.2 s.