Search code examples
pythonpandasdistribution

Fit expected values between 2 values in pandas dataframe?


I have a timeseries dataset which I want to train using LSTM. The target column in my dataset is a measurement of a liquid at a certain timestep during an experiment.

For Example: at the 0th timestep (beginning of experiment) the concentration of alcohol is 0 and when it is measured again after an hour, we observe concentration now is 10.

So, the timesteps between the start of the experiment and before an hour have values NaN because alcohol concentration was not measured during that time.

I want to fill those NaN's with values which do not have to be accurate but just an estimate is also fine.

Example dataset

Timestamp  concentration
10:15          0 
10:20          NaN
10:30          NaN
10:40          NaN
10:50          NaN
11:00          NaN
10:15          10

I want to fill those NaN's with some values that might be appropriate for the range 0-10 and similarly fill that complete column with such values between 2 concentrations.


Solution

  • Try interpolate()

    df.set_index(pd.to_datetime(df['Timestamp']))['concentration'].interpolate()
    

    Output:

    Timestamp
    2023-07-26 10:15:00     0.000000
    2023-07-26 10:20:00     1.666667
    2023-07-26 10:30:00     3.333333
    2023-07-26 10:40:00     5.000000
    2023-07-26 10:50:00     6.666667
    2023-07-26 11:00:00     8.333333
    2023-07-26 10:15:00    10.000000