Search code examples
pythonmatlabnumpythreshold

Finding first samples greater than a threshold value efficiently in Python (and MATLAB comparison)


Instead of finding all the samples / data points within a list or an array which are greater than a particular threshold, I would like to find only the first samples where a signal becomes greater than a threshold. The signal might cross the threshold several times. For example if I have an example signal:

signal = [1, 2, 3, 4, 4, 3, 2, 1, 0, 3, 2, 1, 0, 0, 1, 1, 4, 8, 7, 6, 5, 0]

and a threshold = 2, then

signal = numpy.array(signal)
is_bigger_than_threshold = signal > threshold

would give me all values in signalwhich are greater than threshold. However, I would like to get only the first samples whenever signal becomes greater than threshold. Therefore, I am going through the whole list and make boolean comparisons like

first_bigger_than_threshold = list()
first_bigger_than_threshold.append(False)
for i in xrange(1, len(is_bigger_than_threshold)):
    if(is_bigger_than_threshold[i] == False):
        val = False
    elif(is_bigger_than_threshold[i]):
        if(is_bigger_than_threshold[i - 1] == False):
            val = True
        elif(is_bigger_than_threshold[i - 1] == True):
            val = False
    first_bigger_than_threshold.append(val)

This gives me the result I was looking for, namely

[False, False, True, False, False, False, False, False, False, True, False, False, False,   
False, False, False, True, False, False, False, False, False]

In MATLAB I would do similarily

for i = 2 : numel(is_bigger_than_threshold)
    if(is_bigger_than_threshold(i) == 0)
        val = 0;
    elseif(is_bigger_than_threshold(i))
        if(is_bigger_than_threshold(i - 1) == 0)
            val = 1;
        elseif(is_bigger_than_threshold(i - 1) == 1)
            val = 0;
        end
    end
    first_bigger_than_threshold(i) = val;
end % for

Is there a more efficient (faster) way to perform this calculation?

If I generate data in Python, e.g.

signal = [round(random.random() * 10) for i in xrange(0, 1000000)]

and time it, calculating these values took 4.45 seconds. If I generate data in MATLAB

signal = round(rand(1, 1000000) * 10);

and execute the program it takes only 0.92 seconds.

Why is MATLAB almost 5 times quicker than Python performing this task?

Thanks in advance for your comments!


Solution

  • The other answers give you positions of first Trues, if you want a bool array that marks the first True, you can do it faster:

    import numpy as np
    
    signal = np.random.rand(1000000)
    th = signal > 0.5
    th[1:][th[:-1] & th[1:]] = False