Search code examples
pythonpython-3.xpandastime-seriescontinuous

Counting continuous nan values in panda Time series


I actually work on time series in Python 3 and Pandas and I want to make a synthesis of periods of contiguous missing values but I'm only able to find the indexes of nan values ...

Sample data :
                     Valeurs
2018-01-01 00:00:00      1.0
2018-01-01 04:00:00      NaN
2018-01-01 08:00:00      2.0
2018-01-01 12:00:00      NaN
2018-01-01 16:00:00      NaN
2018-01-01 20:00:00      5.0
2018-01-02 00:00:00      6.0
2018-01-02 04:00:00      7.0
2018-01-02 08:00:00      8.0
2018-01-02 12:00:00      9.0
2018-01-02 16:00:00      5.0
2018-01-02 20:00:00      NaN
2018-01-03 00:00:00      NaN
2018-01-03 04:00:00      NaN
2018-01-03 08:00:00      1.0
2018-01-03 12:00:00      2.0
2018-01-03 16:00:00      NaN

Expected results :
       Start_Date      number of contiguous missing values 
2018-01-01 04:00:00      1
2018-01-01 12:00:00      2
2018-01-02 20:00:00      3
2018-01-03 16:00:00      1

How can i manage to obtain this type of results with pandas (shift(), cumsum(), groupby() ???)?

Thank you for your advice!

Sylvain


Solution

  • groupby and agg

    mask = df.Valeurs.isna()
    d = df.index.to_series()[mask].groupby((~mask).cumsum()[mask]).agg(['first', 'size'])
    d.rename(columns=dict(size='num of contig null', first='Start_Date')).reset_index(drop=True)
    
                Start_Date  num of contig null
    0  2018-01-01 04:00:00                   1
    1  2018-01-01 12:00:00                   2
    2  2018-01-02 20:00:00                   3
    3  2018-01-03 16:00:00                   1