Search code examples
pythoninterpolationmissing-data

Using python, how to fill in missing dates and data in two columns


I have a time series of month/day-time followed by values. With equipment failure, some times are missing. I want to replace those times (e.g. 21:00, 01:00 below) and interpolate the missing values associated with them. What is a good way to do this?

The data looks like:

03/31 19:00 68.0
03/31 20:00 68.0
03/31 22:00 70.0
03/31 23:00 68.0
04/01 00:00 69.0
04/01 02:00 70.0

The "04/01 00:00" values are strings and the observations are floats.

I converted the string dates to numbers using: date_number=datetime.strptime(col_1[i],'%m/%d %H:%M') which yields "1900-03-31 19:00:00" as the result. I can do arithmetic on those, find the gaps, fill them in and put nans in the same place in the other column then interpolate those missing values. I'm sure there is a more efficient, standard approach to the problem, and I'd like to know how to best do it.


Solution

  • Suppose the following dataframe:

    >>> df
              Date  Value
    0  03/31 19:00   68.0
    1  03/31 20:00   68.0
    2  03/31 22:00   70.0
    3  03/31 23:00   68.0
    4  04/01 00:00   69.0
    5  04/01 02:00   70.0
    

    You can create a Series indexed by Date to really work on TimeSeries:

    df['Date'] = pd.to_datetime('2023/' + df['Date'], format='%Y/%m/%d %H:%M')
    ts = df.set_index('Date')['Value'].resample('H').interpolate()
    

    Output:

    >>> ts
    Date
    2023-03-31 19:00:00    68.0
    2023-03-31 20:00:00    68.0
    2023-03-31 21:00:00    69.0  # <- HERE
    2023-03-31 22:00:00    70.0
    2023-03-31 23:00:00    68.0
    2023-04-01 00:00:00    69.0
    2023-04-01 01:00:00    69.5  # <- HERE
    2023-04-01 02:00:00    70.0
    Freq: H, Name: Value, dtype: float64