Search code examples
pandasgroup-by

Fill missing months with zeros on a Groupby


I have a pandas dataframe with the following columns:

enter image description here

The Posting Date goes from 2018-01-01 (yyyy-mm-dd) until 2022-10-30 (yyyy-mm-dd) I want to group my dataframe by Material and by Posting Date date with granurality of 1 month and for the months within the Posting Date range, fill the Total_Sum with zeros. So i need my output to be somthing similar to this:

    Material        Posting Date      Total_Sum
0   E7400R101012      2018-01            0
1   E7400R101012      2018-02            0
2   E7400R101012      2018-03            0
3   E7400R101012      2018-04            0
4   E7400R101012      2018-05            0
5   E7400R101012      2018-06            2
6   E7400R101012      2018-07            0
7   E7400R101012      2018-08            0
8   E7400R101012      2018-09            0
9   E7400R101012      2018-10            0
10  E7400R101012      2018-11            3
11  E7400R101012      2018-12            0
--                     ---
nn  E7400R101012      2022-10            0
nn  NEW CODE          2018-01            0

Solution

  • Convert column Posting Date to month periods by Series.dt.to_period, so possible create period_range by minimal and maximal values and create MultiIndex.from_product.

    Then aggregate sum and add missing pairs by DataFrame.reindex:

    df1['Posting Date'] = pd.to_datetime(df1['Posting Date']).dt.to_period('m')
    periods = pd.period_range(df1['Posting Date'].min(), df1['Posting Date'].max(), freq='m')
    mux = pd.MultiIndex.from_product([df1['Material'].unique(), 
                                      periods], 
                                      names=['Material', 'Posting Date'])
    
    df2 = (df1.groupby(['Material', 'Posting Date']).sum()
              .reindex(mux, fill_value=0)
              .reset_index())