Search code examples
pythonpython-3.xpandaspandas-resample

Resampling of MultiIndex


I would like a daily breakdown of a dataset by type. There are not records for every day for every type, where they don't exist I would like NaN.

I'm able to get a 'resampled to daily' result, but the type is ommitted.

The code below should be a fully working example (well, working apart from the known error at the end!):

import pandas as pd
import datetime as dt

df = pd.DataFrame({
    'Date': [dt.datetime(2021,1,1), dt.datetime(2021, 1, 3), dt.datetime(2020,1,2)],
    'Type': ['A', 'A', 'B'],
    'Value': [1,2,3]
})

df.set_index('Date', inplace=True)
#   this loses the 'type'
print(df.resample('1D').mean())

df = df.reset_index().set_index(['Date', 'Type'])

#   this raises an exception "TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'"
print(df.resample('1D').mean())

Output I'm looking for is a row for every day / type combo:

date type value
20210101 A 1
20210102 A NaN
20210103 A 2
20210101 B NaN
20210102 B 3
20210103 B NaN

Any advice or pointers gratefully received.


Solution

  • If need resample per groups is possible use Grouper for resample per days and then for add missing values is used Series.unstack with DataFrame.stack:

    df = (df.groupby(['Type', pd.Grouper(freq='1D', key='Date')])['Value']
            .mean()
            .unstack()
            .stack(dropna=False)
            .reset_index(name='Value')
          
          )
    print (df)  
      Type       Date  Value
    0    A 2021-01-01    1.0
    1    A 2021-01-02    NaN
    2    A 2021-01-03    2.0
    3    B 2021-01-01    NaN
    4    B 2021-01-02    3.0
    5    B 2021-01-03    NaN
    

    If need only append missing datetimes per groups is used DataFrame.reindex:

    mux = pd.MultiIndex.from_product([df['Type'].unique(),
                                      pd.date_range(df['Date'].min(), df['Date'].max())], 
                                      names=['Date','Type'])
    df = df.set_index(['Type','Date']).reindex(mux).reset_index()
    print (df)                
      Date       Type  Value
    0    A 2021-01-01    1.0
    1    A 2021-01-02    NaN
    2    A 2021-01-03    2.0
    3    B 2021-01-01    NaN
    4    B 2021-01-02    3.0
    5    B 2021-01-03    NaN