I have a pandas dataframe with the following columns:
The Posting Date
goes from 2018-01-01 (yyyy-mm-dd) until 2022-10-30 (yyyy-mm-dd)
I want to group my dataframe by Material
and by Posting Date
date with granurality of 1 month and for the months within the Posting Date
range, fill the Total_Sum with zeros.
So i need my output to be somthing similar to this:
Material Posting Date Total_Sum
0 E7400R101012 2018-01 0
1 E7400R101012 2018-02 0
2 E7400R101012 2018-03 0
3 E7400R101012 2018-04 0
4 E7400R101012 2018-05 0
5 E7400R101012 2018-06 2
6 E7400R101012 2018-07 0
7 E7400R101012 2018-08 0
8 E7400R101012 2018-09 0
9 E7400R101012 2018-10 0
10 E7400R101012 2018-11 3
11 E7400R101012 2018-12 0
-- ---
nn E7400R101012 2022-10 0
nn NEW CODE 2018-01 0
Convert column Posting Date
to month periods by Series.dt.to_period
, so possible create period_range
by minimal and maximal values and create MultiIndex.from_product
.
Then aggregate sum
and add missing pairs by DataFrame.reindex
:
df1['Posting Date'] = pd.to_datetime(df1['Posting Date']).dt.to_period('m')
periods = pd.period_range(df1['Posting Date'].min(), df1['Posting Date'].max(), freq='m')
mux = pd.MultiIndex.from_product([df1['Material'].unique(),
periods],
names=['Material', 'Posting Date'])
df2 = (df1.groupby(['Material', 'Posting Date']).sum()
.reindex(mux, fill_value=0)
.reset_index())