Search code examples
pythonpandastimestampnanhour

Add missing timestamp row to a dataframe


I have a dataframe which contains data that were measured at two hours interval each day, some time intervals are however missing. My dataset looks like below:

2020-12-01 08:00:00 145.9
2020-12-01 10:00:00 100.0
2020-12-01 16:00:00 99.3
2020-12-01 18:00:00 91.0

I'm trying to insert the missing time intervals and fill their value with Nan.

2020-12-01 08:00:00 145.9
2020-12-01 10:00:00 100.0
2020-12-01 12:00:00 Nan
2020-12-01 14:00:00 Nan
2020-12-01 16:00:00 99.3
2020-12-01 18:00:00 91.0

I will appreciate any help on how to achieve this in python as i'm a newbie starting out with python


Solution

  • assuming your df looks like

                  datetime  value
    0  2020-12-01T08:00:00  145.9
    1  2020-12-01T10:00:00  100.0
    2  2020-12-01T16:00:00   99.3
    3  2020-12-01T18:00:00   91.0
    

    make sure datetime column is dtype datetime;

    df['datetime'] = pd.to_datetime(df['datetime'])
    

    so that you can now resample to 2-hourly frequency:

    df.resample('2H', on='datetime').mean()
    
                         value
    datetime                  
    2020-12-01 08:00:00  145.9
    2020-12-01 10:00:00  100.0
    2020-12-01 12:00:00    NaN
    2020-12-01 14:00:00    NaN
    2020-12-01 16:00:00   99.3
    2020-12-01 18:00:00   91.0
    

    Note that you don't need to set the on= keyword if your df already has a datetime index. The df resulting from resampling will have a datetime index.

    Also note that I'm using .mean() as aggfunc, meaning that if you have multiple values within the two hour intervals, you'll get the mean of that.