python list optimization micro-optimization

Optimal List Comprehension (Filtering Existing List)

I have a large list (1e8+ entries) in the form [index:boolean]. I want to find the indices of the values that are True. Performance in this task is of the utmost importance.

Currently from what I can tell in Python 3.7.2, the optimal way of doing this is using a list comprehension as follows:

return [i for i, j in enumerate(numbers) if j]

I have also tried the following (although it seems to simply be the preferred method from earlier versions of Python):

return list(filter(lambda a: a, numbers))

The second approach is about 25% slower than the first approach.

Currently, this operation takes about (0.8*x) time, where the actual logic part of my algorithm takes 'x' time. (For example if the logic takes 10 seconds, extracting the positive values from the list takes about 8 second). I had hoped this operation would be much faster.

Solution

Performance in this task is of the utmost importance

Then you should consider using a numpy array:

import numpy as np
from random import choice
from timeit import Timer

bools = True, False
li = [choice(bools) for _ in range(int(1e8))]
arr = np.array(li)  

print(Timer(lambda: np.nonzero(arr)).repeat(1, 1))

Outputs

[0.4524359999999916]

That's 0.4524359999999916 seconds.