Search code examples
pythonpandasdataframepandas-resample

Resample within time frame in Python


I would like to resample df by creating monthly data for all columns and filling in missing values with 0, within the time frame of say 2019-01-01 to 2019-12-31.

df:

    ITEM_ID Date        Value YearMonth
0   101002  2019-03-31  1.0   2019-03
1   101002  2019-04-30  1.0   2019-04
2   101002  2019-10-31  0.0   2019-10
3   101002  2019-11-30  8.0   2019-11
4   101002  2019-12-31  5.0   2019-12

Expected output:

    ITEM_ID Date        Value YearMonth
         ...            0     2019-01 (added)
         ...            0     2019-02 (added)
0   101002  2019-03-31  1.0   2019-03
1   101002  2019-04-30  1.0   2019-04
         ...            0     2019-05 (added)
         ...            0     2019-06 (added)
         ...            0     2019-07 (added)
         ...            0     2019-08 (added)
         ...            0     2019-09 (added)
2   101002  2019-10-31  0.0   2019-10
3   101002  2019-11-30  8.0   2019-11
4   101002  2019-12-31  5.0   2019-12

I came across a few methods like multiindex and resample. multiindex seems to be versatile but gets a bit complicated when it involves different levels of indexes; I am not sure if resample allows me to extend the effect to specified time frame. What is the best way to do it?


Solution

  • I think you need DataFrame.reindex:

    df['YearMonth'] = pd.to_datetime(df['YearMonth'])
    r = pd.to_datetime(pd.date_range('2019-01-01', '2020-01-01', freq='1MS'))
    mux = pd.MultiIndex.from_product([df['ITEM_ID'].unique(), r], names=['ITEM_ID','YearMonth'])
    
    df = df.set_index(['ITEM_ID','YearMonth']).reindex(mux).fillna({'Value':0}).reset_index().reindex(df.columns, axis=1)
    
    print (df)
        ITEM_ID        Date  Value  YearMonth
    0    101002         NaN    0.0 2019-01-01
    1    101002         NaN    0.0 2019-02-01
    2    101002  2019-03-31    1.0 2019-03-01
    3    101002  2019-04-30    1.0 2019-04-01
    4    101002         NaN    0.0 2019-05-01
    5    101002         NaN    0.0 2019-06-01
    6    101002         NaN    0.0 2019-07-01
    7    101002         NaN    0.0 2019-08-01
    8    101002         NaN    0.0 2019-09-01
    9    101002  2019-10-31    0.0 2019-10-01
    10   101002  2019-11-30    8.0 2019-11-01
    11   101002  2019-12-31    5.0 2019-12-01
    12   101002         NaN    0.0 2020-01-01