Search code examples
pythonnumpystatisticsfrequencyloss

How to find the number of losses in a series


I have an array like [0, 0, 0, 1, 2, -1, -2, 5, 8, 4, 5.5] and want to find the number of times an element is less than the previous one (ignoring the zeros at the beginning that represent missing data).

In this example, the answer is 3 (for -1, -2 and 4) and I have to divide it by the number of valid numbers (e.g. 8). Expected result is 3/8 = 0.375.

I wrote an instruction and wonder if there is a more efficient way to have it as I have to run it millions of times.

My current instruction:

v = np.array([0, 0, 0, 1, 2, -1, -2, 5, 8, 4, 5.5])
print(np.sum((v < np.roll(v,1))[1:]) / np.sum(v != 0)) # loss frequency

Any clue?

Note: as soon as there is a first valid numbers (i.e. different from 0), all following numbers (including 0) are valid.


Solution

  • With np.trim_zeros (to trim leading zeros) and simple subtraction:

    v_trimmed = np.trim_zeros(v, 'f')
    np.sum((v_trimmed[1:] - v_trimmed[:-1]) < 0) / len(v_trimmed)
    

    Or with np.diff for subtraction:

    np.sum(np.diff(v_trimmed) < 0) / len(v_trimmed)
    

    0.375