I have a time series of month/day-time followed by values. With equipment failure, some times are missing. I want to replace those times (e.g. 21:00, 01:00 below) and interpolate the missing values associated with them. What is a good way to do this?
The data looks like:
03/31 19:00 68.0
03/31 20:00 68.0
03/31 22:00 70.0
03/31 23:00 68.0
04/01 00:00 69.0
04/01 02:00 70.0
The "04/01 00:00"
values are strings and the observations are floats.
I converted the string dates to numbers using:
date_number=datetime.strptime(col_1[i],'%m/%d %H:%M')
which yields "1900-03-31 19:00:00"
as the result. I can do arithmetic on those, find the gaps, fill them in and put nans in the same place in the other column then interpolate those missing values. I'm sure there is a more efficient, standard approach to the problem, and I'd like to know how to best do it.
Suppose the following dataframe:
>>> df
Date Value
0 03/31 19:00 68.0
1 03/31 20:00 68.0
2 03/31 22:00 70.0
3 03/31 23:00 68.0
4 04/01 00:00 69.0
5 04/01 02:00 70.0
You can create a Series
indexed by Date
to really work on TimeSeries:
df['Date'] = pd.to_datetime('2023/' + df['Date'], format='%Y/%m/%d %H:%M')
ts = df.set_index('Date')['Value'].resample('H').interpolate()
Output:
>>> ts
Date
2023-03-31 19:00:00 68.0
2023-03-31 20:00:00 68.0
2023-03-31 21:00:00 69.0 # <- HERE
2023-03-31 22:00:00 70.0
2023-03-31 23:00:00 68.0
2023-04-01 00:00:00 69.0
2023-04-01 01:00:00 69.5 # <- HERE
2023-04-01 02:00:00 70.0
Freq: H, Name: Value, dtype: float64