Search code examples
pythonpandasdataframedategroup-by

How to include start and end date in pandas groupby/Grouper?


I have a example DataFrame below.

import pandas as pd
df = pd.DataFrame({
    'date':["2022-01-01", "2022-02-01", "2022-03-01",
             "2022-04-01", "2022-05-01", "2022-06-01"],
    'amount': [2,4,3,5,6,4],
    'category': ['credit', 'credit', 'debit', 'credit', 'debit', 'credit']})

df.date = pd.to_datetime(df.date)

print(df)
       date  amount category
0 2022-01-01       2   credit
1 2022-02-01       4   credit
2 2022-03-01       3    debit
3 2022-04-01       5   credit
4 2022-05-01       6    debit
5 2022-05-01       4    debit
6 2022-06-01       4   credit

I want to get the average sum of amount of debit's per month. However, when I try the groupby using grouper I get:

print(df[df['category']=='debit'].groupby(pd.Grouper(key="date", freq="MS")).sum())

            amount
date              
2022-03-01       3
2022-04-01       0
2022-05-01       10

My desired output has all the months from the original DataFrame but 0 in the months there are no debits

            amount
date  
2022-01-01       0
2022-02-01       0     
2022-03-01       3
2022-04-01       0
2022-05-01       10
2022-06-01       0

I have tried using "Origin" to set start date but no luck :(


Solution

  • Use df.mask and convert DatetimeIndex as PeriodIndex

    out = (df['amount'].mask(df['category'] == 'credit', other=0)
                       .groupby(df['date'].dt.to_period('M'))
                       .sum().reset_index())
    # Output
          date  amount
    0  2022-01       0
    1  2022-02       0
    2  2022-03       3
    3  2022-04       0
    4  2022-05      10
    5  2022-06       0
    

    Another way with pd.Grouper:

    out = (df.assign(amount=df['amount'] * df['category'].eq('debit'))
             .groupby(pd.Grouper(key='date', freq='MS'))['amount']
             .sum().reset_index())
    print(out)
    
    # Output
            date  amount
    0 2022-01-01       0
    1 2022-02-01       0
    2 2022-03-01       3
    3 2022-04-01       0
    4 2022-05-01      10
    5 2022-06-01       0
    

    The idea is to set amount to 0 when the category is "credit".