Search code examples
pythonpandastransactionsanalysisbank

How to compare average value from recent 2 weeks to average of previous 6 weeks by group


I'm trying to create a function that would affectively analyze merchant bank balance transaction averages across time in order to identify trends that could notify a lender of an increase in likelihood of missing a payment.

So if the merchants most recent 2 week average bank balance is significantly below their own 6 week average, then I'd like to send an automated report of the merchants that meet this condition.

Essentially I want to be able to calculate and compare each merchants 2 week average bank balance with the same merchants 6 week average. I'm able to calculate the average balance cant compare the most recent 2 weeks with the 6 week prior

I'll show my attempt and what went wrong here:

import pandas as pd
import numpy as np

df = pd.DataFrame({'date': pd.to_datetime(['2021-12-20','2021-10-20','2021-12-24','2021-12-19','2021-12-01','2021-12-12','2021-10-29','2021-11-09','2021-11-16']), 
                       'credits': [0,1,0,0,1,1,0,1,0],
                       'debits': [1,0,1,1,0,0,1,0,1],
                       'merchant': ['target','target','target','costco','costco','costco','publix','publix','publix'],
                       'amount': [-230, 558,-185,-899,1009,564,-741,199,-187]})

df.index = df.date grouped = df.groupby(['merchant']).resample('W')[['amount']].mean()

df["Average Credits in the last 2 weeks"] = df.rolling("7D", on="date").mean()["credits"]
df["Average Credits in the last 6 weeks"] = df.rolling("42D", on="date").mean()["credits"]
df['flag'] = df.apply(lambda row: row["Average Credits in the last 6 weeks"] > row["Average Credits in the last 2 weeks"], axis=1)

So I failed to calculate the rolling average of credits by merchant with these functions. I used credits as an example because when trying to use the rolling function with mean()['amount'] instead of mean()['credits] would give me an error.

If I'm going about this the wrong way I'd love for any help you can give me. Thanks in advance!


Solution

  • I understood that your issue is that you don't manage to get the rolling average based on the category "merchant". If that is it, one way would be this one:

    1. First groupby with resample Write the resample, the same way that you already had, I just added a ffill (fill down to avoid nans if you don't have one of the datapoints):
      df.index = df.date
      grouped = df.groupby(['merchant']).resample('W')[['amount']].mean()
      grouped.ffill(axis = 0,inplace = True)

    2. Apply groupby again, with rolling on the column "amount" for 2 and 6 weeks. Note that there will be nans in this example because for example to calculate the rolling of 2 weeks if will need at least the previous week:

    grouped = grouped.reset_index()
    grouped.index = grouped.date
    mean_2 = grouped.groupby(["merchant"]).rolling(2).amount.mean()
    mean_6 = grouped.groupby(["merchant"]).rolling(6).amount.mean()

    1. Concatenate series
      result=pd.concat([mean_2,mean_6],axis=1)
      result.columns = ["mean2","mean6"]

    The result is like this:
    enter image description here