I have a example DataFrame below.
import pandas as pd
df = pd.DataFrame({
'date':["2022-01-01", "2022-02-01", "2022-03-01",
"2022-04-01", "2022-05-01", "2022-06-01"],
'amount': [2,4,3,5,6,4],
'category': ['credit', 'credit', 'debit', 'credit', 'debit', 'credit']})
df.date = pd.to_datetime(df.date)
print(df)
date amount category
0 2022-01-01 2 credit
1 2022-02-01 4 credit
2 2022-03-01 3 debit
3 2022-04-01 5 credit
4 2022-05-01 6 debit
5 2022-05-01 4 debit
6 2022-06-01 4 credit
I want to get the average sum of amount of debit's per month. However, when I try the groupby using grouper I get:
print(df[df['category']=='debit'].groupby(pd.Grouper(key="date", freq="MS")).sum())
amount
date
2022-03-01 3
2022-04-01 0
2022-05-01 10
My desired output has all the months from the original DataFrame but 0 in the months there are no debits
amount
date
2022-01-01 0
2022-02-01 0
2022-03-01 3
2022-04-01 0
2022-05-01 10
2022-06-01 0
I have tried using "Origin" to set start date but no luck :(
Use df.mask
and convert DatetimeIndex
as PeriodIndex
out = (df['amount'].mask(df['category'] == 'credit', other=0)
.groupby(df['date'].dt.to_period('M'))
.sum().reset_index())
# Output
date amount
0 2022-01 0
1 2022-02 0
2 2022-03 3
3 2022-04 0
4 2022-05 10
5 2022-06 0
Another way with pd.Grouper
:
out = (df.assign(amount=df['amount'] * df['category'].eq('debit'))
.groupby(pd.Grouper(key='date', freq='MS'))['amount']
.sum().reset_index())
print(out)
# Output
date amount
0 2022-01-01 0
1 2022-02-01 0
2 2022-03-01 3
3 2022-04-01 0
4 2022-05-01 10
5 2022-06-01 0
The idea is to set amount
to 0 when the category
is "credit".