Search code examples
pandasdataframemissing-datamulti-indeximputation

impute missing values in a multi index data frame with taking the last available value of second index level


I'm struggling with the following problem: I have a multilevel index data frame of time series data of following structure:

import pandas as pd
import numpy as np

multi_index = pd.MultiIndex.from_tuples([('2022-02-18', '2022-02-17'),
                                        ('2022-02-19', '2022-02-17'),
                                        ('2022-02-20', '2022-02-17'),
                                        ('2022-02-21', '2022-02-17'),
                                        ('2022-02-19', '2022-02-18'),
                                        ('2022-02-20', '2022-02-18'),
                                        ('2022-02-21', '2022-02-18'),
                                        ('2022-02-22', '2022-02-18'),
                                        ('2022-02-20', '2022-02-19'),
                                        ('2022-02-21', '2022-02-19'),
                                        ('2022-02-22', '2022-02-19'),
                                        ('2022-02-23', '2022-02-19'),],
                                       names=['date1','date2'])

data = [[45, 365], 
        [91, 254], 
        [60, 268], 
        [57, 781], 
        [68, 236], 
        [36, np.nan], 
        [87, 731], 
        [12, 452], 
        [np.nan, 214], 
        [33, 654], 
        [74, 113], 
        [65, 381]]

df = pd.DataFrame(data, columns=['value1', 'value2'], index = multi_index)

df looks like the following table:

date1 date2 value1 value2
2022-02-18 2022-02-17 45 365
2022-02-19 2022-02-17 91 254
2022-02-20 2022-02-17 60 268
2022-02-21 2022-02-17 57 781
2022-02-19 2022-02-18 68 236
2022-02-20 2022-02-18 36 NaN
2022-02-21 2022-02-18 87 731
2022-02-22 2022-02-18 12 452
2022-02-20 2022-02-19 NaN 214
2022-02-21 2022-02-19 33 654
2022-02-22 2022-02-19 74 113
2022-02-23 2022-02-19 65 381

date1 and date2 are the multi index. I would like to impute the missing values in the table with the last available value in date2. So in this case it would be 36 for value1 and 268 for value2, because I want that date1 of the imputed value is the same and for date2 I want to take the last available value.

I tried to impute with pandas method fillna() and tried different variations of the hyper parameter 'method' but nothing seems to be a proper solution for my problem.


Solution

  • This should give you what you've described:

    df.groupby('date1').ffill()