Search code examples
pythonpandasdatetimedatetimeindex

Calculate the amount of time that a column was positive before reverting to 0


I have a table containing non-negative values against a datetimeindex, like the following:

                    CapturableSeparation
date    
2021-02-23 18:09:00 0.00
2021-02-23 18:10:00 0.00
2021-02-23 18:11:00 0.04
2021-02-23 18:12:00 0.04
2021-02-23 18:13:00 0.00
... ...
2021-02-25 23:56:00 0.00
2021-02-25 23:57:00 0.91
2021-02-25 23:58:00 0.74
2021-02-25 23:59:00 0.55

I want to create a table of the amount of time between non-consecutive 0s (amount of time that positive values persist before reverting to 0) keyed with the average value of the "CapturableSeparation" during those consecutive positive values. For the data that is visible, the table might look like:

                    AvgValue
persistence 
00:02:00            0.04
00:03:00            0.73

where the first row corresponds to the positive values at the beginning of the Dataframe that persist for 2 minutes and the second row corresponds to those at the end that persist for 3 minutes.

How should this be done?


Solution

  • Here is one way of solving the problem by identifying the consecutive blocks of non-zero values using boolean masking and cumsum:

    m = df['CapturableSeparation'].eq(0)
    b = m.cumsum()[~m]
    agg_dict = {'persistence': ('date', np.ptp), 
                'avgvalue'   : ('CapturableSeparation', 'mean')}
    
    out = df.groupby(b, as_index=False).agg(**agg_dict)
    out['persistence'] += pd.Timedelta(minutes=1)
    

    Details:

    Compare the CapturableSeparation column with 0 to create a boolean mask:

    >>> m
    
    0     True
    1     True
    2    False
    3    False
    4     True
    5     True
    6    False
    7    False
    8    False
    Name: CapturableSeparation, dtype: bool
    

    Then use cumsum on the above boolean mask to identify the blocks of consecutive non-zero values:

    >>> b
    
    2    2
    3    2
    6    4
    7    4
    8    4
    Name: CapturableSeparation, dtype: int64
    

    Group the dataframe on these consecutive blocks and aggregate the column date using np.ptp and column CapturableSeparation using mean:

    >>> out
    
          persistence  avgvalue
    0 0 days 00:02:00  0.040000
    1 0 days 00:03:00  0.733333