Search code examples
pythonpandasdataframetime-seriesgaps-in-data

Getting a count of missing timestamps in a dataset based on expected interval


I have a dataset that is made up of timestamp readings for pressure that should be recorded and returned every 15 minutes. The example data below shows that there are gaps greater than 15 minutes in the data set

I have been trying to find a way to add in a count column that counts how many readings are missing between received ones, such that a 15-minute interval = 0 (missed readings), a half-hour gap = 1 (missed reading )and a 45-minute gap = 2, etc.

At this stage I don't have any code that I would want to show because I am still a long way off, although I have been working around this post, but still not succeeding. How Can I Detect Gaps and Consecutive Periods In A Time Series In Pandas

Any pointers would be greatly appreciated

Place             date       pressure (m)
Somewhere   01/09/2019 00:00    34
Somewhere   01/09/2019 00:30    34
Somewhere   01/09/2019 00:45    34
Somewhere   01/09/2019 01:15    34
Somewhere   01/09/2019 01:30    34
Somewhere   01/09/2019 02:15    34
Somewhere   01/09/2019 02:30    34
Somewhere   01/09/2019 02:45    34
Somewhere   01/09/2019 03:15    34
Somewhere   01/09/2019 03:30    34
Somewhere   01/09/2019 03:45    34.5
Somewhere   01/09/2019 04:00    34
Somewhere   01/09/2019 04:15    34
Somewhere   01/09/2019 06:45    33.5
Somewhere   01/09/2019 07:00    33.5
Somewhere   01/09/2019 07:30    34

Solution

  • Use, pd.to_datetime to convert the date column into pandas datetime series, then use Series.diff to calculate the successive differences between the dates, then divide this differences by pd.Timedelta with interval of 15min, finally use .fillna to fill the NaN values with 0:

    df['date'] = pd.to_datetime(df['date'])
    df['gap'] = (df['date'].diff() / pd.Timedelta(minutes=15)).sub(1).fillna(0)
    

    Result:

    # print(df)
    
            Place                date  pressure (m)  gap
    0   Somewhere 2019-01-09 00:00:00          34.0  0.0
    1   Somewhere 2019-01-09 00:30:00          34.0  1.0
    2   Somewhere 2019-01-09 00:45:00          34.0  0.0
    3   Somewhere 2019-01-09 01:15:00          34.0  1.0
    4   Somewhere 2019-01-09 01:30:00          34.0  0.0
    5   Somewhere 2019-01-09 02:15:00          34.0  2.0
    6   Somewhere 2019-01-09 02:30:00          34.0  0.0
    7   Somewhere 2019-01-09 02:45:00          34.0  0.0
    8   Somewhere 2019-01-09 03:15:00          34.0  1.0
    9   Somewhere 2019-01-09 03:30:00          34.0  0.0
    10  Somewhere 2019-01-09 03:45:00          34.5  0.0
    11  Somewhere 2019-01-09 04:00:00          34.0  0.0
    12  Somewhere 2019-01-09 04:15:00          34.0  0.0
    13  Somewhere 2019-01-09 06:45:00          33.5  9.0
    14  Somewhere 2019-01-09 07:00:00          33.5  0.0
    15  Somewhere 2019-01-09 07:30:00          34.0  1.0