Search code examples
pythonpandasnumpyenumeratecumsum

how to find the number of events or pulses recorded when the value is greater than 0 in python?


I've a data set that represents rainfall every hour in a day. I'm creating column 'E1' which should start from zero and increment every time column 'value' is greater than zero, and stops when column 'value' becomes zero again, again when column 'value' is zero the numbering must continue.

condition = ((df['value'] > 0) & (df['value'].shift(periods=1) == 0))

df['E2'] = (condition).cumsum()
print(df)
    hour  value  E2
0      0    0.0   0
1      1    0.2   1
2      2    0.2   1
3      3    0.2   1
4      4    0.0   1
5      5    0.2   2
6      6    0.2   2
7      7    0.0   2
8      8    NaN   2
9      9    0.2   2
10    10    0.0   2
11    11    0.0   2
12    12    0.2   3
13    13    0.2   3
14    14    0.0   3
15    15    NaN   3
16    16    0.2   3
17    17    0.0   3
18    18    0.2   4
19    19    0.0   4
20    20    0.2   5
21    21    0.2   5
22    22    NaN   5
23    23    0.0   5

E1 represents the event number, an event can last 1 or several hours, an event should only be considered when the cell before the start of the event is zero and the cell after the last data is equal to zero

I'm stuck, trying to list the events. Should get:

    hour  value  E2
0      0    0.0   0
1      1    0.2   1
2      2    0.2   1
3      3    0.2   1
4      4    0.0   0
5      5    0.2   2
6      6    0.2   2
7      7    0.0   0
8      8    NaN   0
9      9    0.2   0
10    10    0.0   0
11    11    0.0   0
12    12    0.2   3
13    13    0.2   3
14    14    0.0   0
15    15    NaN   0
16    16    0.2   0
17    17    0.0   0
18    18    0.2   4
19    19    0.0   0
20    20    0.2   0
21    21    0.2   0
22    22    NaN   0
23    23    0.0   0

Solution

  • I find this an odd criteria, but here's how to compute your "event" numbers. Because you're looking both forward and backward, there's no way to do this in a vectorized way.

    import numpy as np
    import pandas as pd
    
    data = [
      0.0,
      0.2,
      0.2,
      0.2,
      0.0,
      0.2,
      0.2,
      0.0,
      np.nan,
      0.2,
      0.0,
      0.0,
      0.2,
      0.2,
      0.0,
      np.nan,
      0.2,
      0.0,
      0.2,
      0.0,
      0.2,
      0.2,
      np.nan,
      0.0
    ]
    
    data = [[k] for k in data]
    df = pd.DataFrame( data, columns=['data'])
    print(df)
    
    nxt = 1
    nums = np.zeros(len(df['data']), dtype=int)
    start = None
    for ndx,v in enumerate(df['data']):
        if np.isnan(v):
            start = None
        elif not v:
            if start is not None and start < ndx:
                nums[start:ndx] = nxt
                nxt += 1
            start = ndx+1
    
    df['E1'] = nums
    print(df)
    

    Output:

        data
    0    0.0
    1    0.2
    2    0.2
    3    0.2
    4    0.0
    5    0.2
    6    0.2
    7    0.0
    8    NaN
    9    0.2
    10   0.0
    11   0.0
    12   0.2
    13   0.2
    14   0.0
    15   NaN
    16   0.2
    17   0.0
    18   0.2
    19   0.0
    20   0.2
    21   0.2
    22   NaN
    23   0.0
        data  E1
    0    0.0   0
    1    0.2   1
    2    0.2   1
    3    0.2   1
    4    0.0   0
    5    0.2   2
    6    0.2   2
    7    0.0   0
    8    NaN   0
    9    0.2   0
    10   0.0   0
    11   0.0   0
    12   0.2   3
    13   0.2   3
    14   0.0   0
    15   NaN   0
    16   0.2   0
    17   0.0   0
    18   0.2   4
    19   0.0   0
    20   0.2   0
    21   0.2   0
    22   NaN   0
    23   0.0   0