I have a large list (1e8+ entries) in the form [index:boolean]
. I want to find the indices of the values that are True. Performance in this task is of the utmost importance.
Currently from what I can tell in Python 3.7.2, the optimal way of doing this is using a list comprehension as follows:
return [i for i, j in enumerate(numbers) if j]
I have also tried the following (although it seems to simply be the preferred method from earlier versions of Python):
return list(filter(lambda a: a, numbers))
The second approach is about 25% slower than the first approach.
Currently, this operation takes about (0.8*x) time, where the actual logic part of my algorithm takes 'x' time. (For example if the logic takes 10 seconds, extracting the positive values from the list takes about 8 second). I had hoped this operation would be much faster.
Performance in this task is of the utmost importance
Then you should consider using a numpy array:
import numpy as np
from random import choice
from timeit import Timer
bools = True, False
li = [choice(bools) for _ in range(int(1e8))]
arr = np.array(li)
print(Timer(lambda: np.nonzero(arr)).repeat(1, 1))
Outputs
[0.4524359999999916]
That's 0.4524359999999916
seconds.