Search code examples
pythonarraysnumpymaskingbitmask

How can I do this python list comprehension in NumPy?


Let's say I have an array of values, r, which range anywhere from 0 to 1. I want to remove all values that are some threshold value away from the median. Let's assume here that that threshold value is 0.5, and len(r) = 3000. Then to mask out all values outside of this range, I can do a simple list comprehension, which I like:

mask = np.array([ri < np.median(r)-0.5 or ri > np.median(r)+0.5 for ri in r])

And if I use a timer on it:

import time
import numpy as np

start = time.time()
r = np.random.random(3000)
m = np.median(r)
maxr,minr = m-0.5, m+0.5
mask = [ri<minr or ri>maxr for ri in r]
end = time.time()
print('Took %.4f seconds'%(end-start))

>>> Took 0.0010 seconds

Is there a faster way to do this list comprehension and make the mask using NumPy?


Edit:

I've tried several suggestions below, including:

  • An element-wise or operator: (r<minv) | (r>maxv)

  • A Numpy logical or: r[np.logical_or(r<minr, r>maxr)]

  • A absolute difference boolean array: abs(m-r) > 0.5

And here is the average time each one took after 300 runs through:

Python list comprehension: 0.6511 ms
Elementwise or: 0.0138 ms
Numpy logical or: 0.0241 ms
Absolute difference: 0.0248 ms

As you can see, the elementwise Or was always the fastest, by nearly a factor of two (don't know how that would scale with array elements). Who knew.


Solution

  • You can use numpy conditional selections to create new array, without those values.

    start = time.time()
    m = np.median(r)
    maxr,minr = m-0.5, m+0.5
    filtered_array = r[ (r < minr) | (r > maxr) ]
    end = time.time()
    print('Took %.4f seconds'%(end-start))
    

    filtered_array is slice of r without masked values (all values that will be later removed by mask already removed in filtered_array).

    Update: used shorter syntax suggested by @ayhan.