Search code examples
pythonpandasreindexpandas-resampleresample

Pandas resample with multiindex start- and enddate


Suppose I have a multi-index Pandas data frame with two index levels: month_begin and month_end

import pandas as pd

multi_index = pd.MultiIndex.from_tuples([("2022-03-01", "2022-03-31"), 
                                  ("2022-04-01", "2022-04-30"), 
                                  ("2022-05-01", "2022-05-31"),
                                  ("2022-06-01", "2022-06-30")])

multi_index.names = ['month_begin', 'month_end']

df = pd.DataFrame(np.random.rand(4,100), index=multi_index)
df
                              0         1   ...        98        99
month_begin month_end                       ...                    
2022-03-01  2022-03-31  0.322032  0.205307  ...  0.975128  0.673460
2022-04-01  2022-04-30  0.113813  0.278981  ...  0.951049  0.090765
2022-05-01  2022-05-31  0.777918  0.842734  ...  0.667831  0.274189
2022-06-01  2022-06-30  0.221407  0.555711  ...  0.745158  0.648246

What I want to do is to resample the data on a daily basis such that the monthly values show up for every day in the month:

                              0         1   ...        98        99
                                            ...                    
2022-03-01             0.322032  0.205307  ...  0.975128  0.673460
2022-03-02             0.322032  0.205307  ...  0.975128  0.673460
2022-03-03             0.322032  0.205307  ...  0.975128  0.673460
...
2022-06-29             0.221407  0.555711  ...  0.745158  0.648246
2022-06-30             0.221407  0.555711  ...  0.745158  0.648246            

I tried to retrieve the month from the first index

df.index.get_level_values('month_begin'),

but then I don't know how to tell that it should resample it on a daily basis. Does anybody know how to do this? I would very much appreciate your help.


Solution

  • Use:

    #convert MultiIndex to columns
    df1 = df.reset_index()
    
    #if necessary convert to datetimes
    df1['month_begin'] = pd.to_datetime(df1['month_begin'])
    df1['month_end'] = pd.to_datetime(df1['month_end'])
    
    #repeat indices by difference between end and start in days
    df1 = (df1.loc[df1.index.repeat(df1['month_end'].sub(df1['month_begin']).dt.days + 1)]
              .drop('month_end',1))
    
    #creat ecounter converted to days timedeltas
    s = pd.to_timedelta(df1.groupby(level=0).cumcount(), unit='d')
    
    #add timedeltas to month_begin and convert to DatetimeIndex
    df1 = (df1.assign(month_begin = df1['month_begin'].add(s))
              .set_index('month_begin')
              .rename_axis(None))
    

    Output for first 3 columns:

    print (df1.iloc[:, :3])
                       0         1         2
    2022-03-01  0.009359  0.499058  0.113384
    2022-03-02  0.009359  0.499058  0.113384
    2022-03-03  0.009359  0.499058  0.113384
    2022-03-04  0.009359  0.499058  0.113384
    2022-03-05  0.009359  0.499058  0.113384
                 ...       ...       ...
    2022-06-26  0.460350  0.241094  0.821568
    2022-06-27  0.460350  0.241094  0.821568
    2022-06-28  0.460350  0.241094  0.821568
    2022-06-29  0.460350  0.241094  0.821568
    2022-06-30  0.460350  0.241094  0.821568
    
    [122 rows x 3 columns]