I have a dataframe which contains data that were measured at two hours interval each day, some time intervals are however missing. My dataset looks like below:
2020-12-01 08:00:00 145.9
2020-12-01 10:00:00 100.0
2020-12-01 16:00:00 99.3
2020-12-01 18:00:00 91.0
I'm trying to insert the missing time intervals and fill their value with Nan.
2020-12-01 08:00:00 145.9
2020-12-01 10:00:00 100.0
2020-12-01 12:00:00 Nan
2020-12-01 14:00:00 Nan
2020-12-01 16:00:00 99.3
2020-12-01 18:00:00 91.0
I will appreciate any help on how to achieve this in python as i'm a newbie starting out with python
assuming your df looks like
datetime value
0 2020-12-01T08:00:00 145.9
1 2020-12-01T10:00:00 100.0
2 2020-12-01T16:00:00 99.3
3 2020-12-01T18:00:00 91.0
make sure datetime column is dtype datetime;
df['datetime'] = pd.to_datetime(df['datetime'])
so that you can now resample to 2-hourly frequency:
df.resample('2H', on='datetime').mean()
value
datetime
2020-12-01 08:00:00 145.9
2020-12-01 10:00:00 100.0
2020-12-01 12:00:00 NaN
2020-12-01 14:00:00 NaN
2020-12-01 16:00:00 99.3
2020-12-01 18:00:00 91.0
Note that you don't need to set the on=
keyword if your df already has a datetime index. The df resulting from resampling will have a datetime index.
Also note that I'm using .mean()
as aggfunc, meaning that if you have multiple values within the two hour intervals, you'll get the mean of that.