Search code examples
pythonpandasdataframenumpyrolling-computation

I have data frame and I want to calculate the last three month transaction count and sum for each Group Id


Input data set

Id Date TransAmt
A 2022-01-02 10
A 2022-01-02 20
A 2022-02-04 30
A 2022-02-05 20
A 2022-04-08 300
A 2022-04-11 100
A 2022-05-13 200
A 2022-06-12 20
A 2022-06-15 300
A 2022-08-16 100

Desired output

Id Date TransAmt CountThreeMonth AmountThreeMonths
A 2022-01-02 10 2 30
A 2022-01-02 20 2 30
A 2022-02-04 30 4 80
A 2022-02-05 20 4 80
A 2022-04-08 300 4 450
A 2022-04-11 100 4 450
A 2022-05-13 200 3 600
A 2022-06-12 20 5 920
A 2022-06-15 300 5 920
A 2022-08-16 100 3 420

Note: 1. There can be multiple transaction for same date i.e. on 2022-01-02 there are two transaction. 2. I want calculate last 3 months transaction like- Present Month total Transaction count + Previous two month total Transaction count. Similar logic for amount. like for Jan month only 2 transaction and previous month does not have any transaction so 2 + 0 + 0 =2. 3. I want all calculation for Each group of Id.

Please help me achieve my desired output Thanking you in Advanced.


Solution

  • Example

    data = [['A', '2022-01-02', 10], ['A', '2022-01-02', 20], ['A', '2022-02-04', 30], 
            ['A', '2022-02-05', 20], ['A', '2022-04-08', 300], ['A', '2022-04-11', 100],
            ['A', '2022-05-13', 200], ['A', '2022-06-12', 20], ['A', '2022-06-15', 300],
            ['A', '2022-08-16', 100], ['B', '2022-01-02', 10], ['B', '2022-01-02', 20],
            ['B', '2022-02-04', 30], ['B', '2022-02-05', 20], ['B', '2022-04-08', 300],
            ['B', '2022-04-11', 100], ['B', '2022-05-13', 200], ['B', '2022-06-12', 20],
            ['B', '2022-06-15', 300], ['B', '2022-08-16', 100]]
    df1 = pd.DataFrame(data, columns=['Id', 'Date', 'TransAmt'])
    

    df1

        Id  Date    TransAmt
    0   A   2022-01-02  10
    1   A   2022-01-02  20
    2   A   2022-02-04  30
    3   A   2022-02-05  20
    4   A   2022-04-08  300
    5   A   2022-04-11  100
    6   A   2022-05-13  200
    7   A   2022-06-12  20
    8   A   2022-06-15  300
    9   A   2022-08-16  100
    10  B   2022-01-02  10
    11  B   2022-01-02  20
    12  B   2022-02-04  30
    13  B   2022-02-05  20
    14  B   2022-04-08  300
    15  B   2022-04-11  100
    16  B   2022-05-13  200
    17  B   2022-06-12  20
    18  B   2022-06-15  300
    19  B   2022-08-16  100
    

    Code

    s = df1['Date']
    df1['Date'] = df1['Date'].astype('Period[M]') 
    df2 = df1.groupby(['Id', 'Date'])['TransAmt'].agg(['count', sum])
    idx1 = pd.period_range(df1['Date'].min(), df1['Date'].max(), freq='M')
    idx2 = pd.MultiIndex.from_product([df1['Id'].unique(), idx1])
    cols = ['Id', 'Date', 'CountThreeMonth', 'AmountofThreeMonth']
    n = 3
    df3 = df2.reindex(idx2, fill_value=0).groupby(level=0).rolling(n, min_periods=1).sum().droplevel(0).reset_index().set_axis(cols, axis=1)
    df1.merge(df3, how='left').assign(Date=s)
    

    result(df1.merge(df3, how='left').assign(Date=s))

        Id  Date    TransAmt    CountThreeMonth AmountofThreeMonth
    0   A   2022-01-02  10      2.0             30.0
    1   A   2022-01-02  20      2.0             30.0
    2   A   2022-02-04  30      4.0             80.0
    3   A   2022-02-05  20      4.0             80.0
    4   A   2022-04-08  300     4.0             450.0
    5   A   2022-04-11  100     4.0             450.0
    6   A   2022-05-13  200     3.0             600.0
    7   A   2022-06-12  20      5.0             920.0
    8   A   2022-06-15  300     5.0             920.0
    9   A   2022-08-16  100     3.0             420.0
    10  B   2022-01-02  10      2.0             30.0
    11  B   2022-01-02  20      2.0             30.0
    12  B   2022-02-04  30      4.0             80.0
    13  B   2022-02-05  20      4.0             80.0
    14  B   2022-04-08  300     4.0             450.0
    15  B   2022-04-11  100     4.0             450.0
    16  B   2022-05-13  200     3.0             600.0
    17  B   2022-06-12  20      5.0             920.0
    18  B   2022-06-15  300     5.0             920.0
    19  B   2022-08-16  100     3.0             420.0
    

    I'm sorry it's hard to explain