I would like to resample df
by creating monthly data for all columns and filling in missing values with 0
, within the time frame of say 2019-01-01
to 2019-12-31
.
df
:
ITEM_ID Date Value YearMonth
0 101002 2019-03-31 1.0 2019-03
1 101002 2019-04-30 1.0 2019-04
2 101002 2019-10-31 0.0 2019-10
3 101002 2019-11-30 8.0 2019-11
4 101002 2019-12-31 5.0 2019-12
Expected output:
ITEM_ID Date Value YearMonth
... 0 2019-01 (added)
... 0 2019-02 (added)
0 101002 2019-03-31 1.0 2019-03
1 101002 2019-04-30 1.0 2019-04
... 0 2019-05 (added)
... 0 2019-06 (added)
... 0 2019-07 (added)
... 0 2019-08 (added)
... 0 2019-09 (added)
2 101002 2019-10-31 0.0 2019-10
3 101002 2019-11-30 8.0 2019-11
4 101002 2019-12-31 5.0 2019-12
I came across a few methods like multiindex
and resample
. multiindex
seems to be versatile but gets a bit complicated when it involves different levels of indexes; I am not sure if resample
allows me to extend the effect to specified time frame. What is the best way to do it?
I think you need DataFrame.reindex
:
df['YearMonth'] = pd.to_datetime(df['YearMonth'])
r = pd.to_datetime(pd.date_range('2019-01-01', '2020-01-01', freq='1MS'))
mux = pd.MultiIndex.from_product([df['ITEM_ID'].unique(), r], names=['ITEM_ID','YearMonth'])
df = df.set_index(['ITEM_ID','YearMonth']).reindex(mux).fillna({'Value':0}).reset_index().reindex(df.columns, axis=1)
print (df)
ITEM_ID Date Value YearMonth
0 101002 NaN 0.0 2019-01-01
1 101002 NaN 0.0 2019-02-01
2 101002 2019-03-31 1.0 2019-03-01
3 101002 2019-04-30 1.0 2019-04-01
4 101002 NaN 0.0 2019-05-01
5 101002 NaN 0.0 2019-06-01
6 101002 NaN 0.0 2019-07-01
7 101002 NaN 0.0 2019-08-01
8 101002 NaN 0.0 2019-09-01
9 101002 2019-10-31 0.0 2019-10-01
10 101002 2019-11-30 8.0 2019-11-01
11 101002 2019-12-31 5.0 2019-12-01
12 101002 NaN 0.0 2020-01-01