I have a timeseries dataset which I want to train using LSTM. The target column in my dataset is a measurement of a liquid at a certain timestep during an experiment.
For Example: at the 0th timestep (beginning of experiment) the concentration of alcohol is 0 and when it is measured again after an hour, we observe concentration now is 10.
So, the timesteps between the start of the experiment and before an hour have values NaN because alcohol concentration was not measured during that time.
I want to fill those NaN's with values which do not have to be accurate but just an estimate is also fine.
Timestamp concentration
10:15 0
10:20 NaN
10:30 NaN
10:40 NaN
10:50 NaN
11:00 NaN
10:15 10
I want to fill those NaN's with some values that might be appropriate for the range 0-10 and similarly fill that complete column with such values between 2 concentrations.
Try interpolate()
df.set_index(pd.to_datetime(df['Timestamp']))['concentration'].interpolate()
Output:
Timestamp
2023-07-26 10:15:00 0.000000
2023-07-26 10:20:00 1.666667
2023-07-26 10:30:00 3.333333
2023-07-26 10:40:00 5.000000
2023-07-26 10:50:00 6.666667
2023-07-26 11:00:00 8.333333
2023-07-26 10:15:00 10.000000