I'm struggling with the following problem: I have a multilevel index data frame of time series data of following structure:
import pandas as pd
import numpy as np
multi_index = pd.MultiIndex.from_tuples([('2022-02-18', '2022-02-17'),
('2022-02-19', '2022-02-17'),
('2022-02-20', '2022-02-17'),
('2022-02-21', '2022-02-17'),
('2022-02-19', '2022-02-18'),
('2022-02-20', '2022-02-18'),
('2022-02-21', '2022-02-18'),
('2022-02-22', '2022-02-18'),
('2022-02-20', '2022-02-19'),
('2022-02-21', '2022-02-19'),
('2022-02-22', '2022-02-19'),
('2022-02-23', '2022-02-19'),],
names=['date1','date2'])
data = [[45, 365],
[91, 254],
[60, 268],
[57, 781],
[68, 236],
[36, np.nan],
[87, 731],
[12, 452],
[np.nan, 214],
[33, 654],
[74, 113],
[65, 381]]
df = pd.DataFrame(data, columns=['value1', 'value2'], index = multi_index)
df looks like the following table:
date1 | date2 | value1 | value2 |
---|---|---|---|
2022-02-18 | 2022-02-17 | 45 | 365 |
2022-02-19 | 2022-02-17 | 91 | 254 |
2022-02-20 | 2022-02-17 | 60 | 268 |
2022-02-21 | 2022-02-17 | 57 | 781 |
2022-02-19 | 2022-02-18 | 68 | 236 |
2022-02-20 | 2022-02-18 | 36 | NaN |
2022-02-21 | 2022-02-18 | 87 | 731 |
2022-02-22 | 2022-02-18 | 12 | 452 |
2022-02-20 | 2022-02-19 | NaN | 214 |
2022-02-21 | 2022-02-19 | 33 | 654 |
2022-02-22 | 2022-02-19 | 74 | 113 |
2022-02-23 | 2022-02-19 | 65 | 381 |
date1 and date2 are the multi index. I would like to impute the missing values in the table with the last available value in date2. So in this case it would be 36 for value1 and 268 for value2, because I want that date1 of the imputed value is the same and for date2 I want to take the last available value.
I tried to impute with pandas method fillna() and tried different variations of the hyper parameter 'method' but nothing seems to be a proper solution for my problem.
This should give you what you've described:
df.groupby('date1').ffill()