Search code examples
pythonlistoptimizationmicro-optimization

Optimal List Comprehension (Filtering Existing List)


I have a large list (1e8+ entries) in the form [index:boolean]. I want to find the indices of the values that are True. Performance in this task is of the utmost importance.

Currently from what I can tell in Python 3.7.2, the optimal way of doing this is using a list comprehension as follows:

return [i for i, j in enumerate(numbers) if j]

I have also tried the following (although it seems to simply be the preferred method from earlier versions of Python):

return list(filter(lambda a: a, numbers))

The second approach is about 25% slower than the first approach.

Currently, this operation takes about (0.8*x) time, where the actual logic part of my algorithm takes 'x' time. (For example if the logic takes 10 seconds, extracting the positive values from the list takes about 8 second). I had hoped this operation would be much faster.


Solution

  • Performance in this task is of the utmost importance

    Then you should consider using a numpy array:

    import numpy as np
    from random import choice
    from timeit import Timer
    
    bools = True, False
    li = [choice(bools) for _ in range(int(1e8))]
    arr = np.array(li)  
    
    print(Timer(lambda: np.nonzero(arr)).repeat(1, 1))
    

    Outputs

    [0.4524359999999916]
    

    That's 0.4524359999999916 seconds.