Search code examples
pythonpandasdataframejupyter-notebookpandas-groupby

Pandas - Group by Cumulative Calculation


I'm trying to do a calculation with cumulative operations using pandas in jupyter notebook.

But I'm stuck.

I have the following DataFrame:

Date Type deposit withdrawal
2020/01/19 A 1000 0
2020/01/23 A 500 0
2020/02/05 A 0 200
2020/02/22 A 1000 200
2020/01/19 B 859.8 0
2020/02/04 B 514.33 50
2020/02/25 B 1463.14 0

I need to apply this formula, grouping by type:

Movement = IF(ISNULL([previous_date]); [Current_Deposit] - [withdrawal];([previous_movement] * (1 +([current_date]-[previous_date]) * 0,0001)) + [Current_Deposit] - [withdrawal])

Example:

Date Type deposit withdrawal Movement Formula
2020/01/19 A 1000 0 1000 IF(ISNULL([NULL]); [1000] - [0];([0] * (1 +([2020/01/19]-[NULL]) * 0,0001)) + [1000] - [0])
2020/01/23 A 500 0 1500.40 IF(ISNULL([2020/01/19]); [500] - [0];([1000] * (1 +([2020/01/23]-[2020/01/19]) * 0,0001)) + [500] - [0])
2020/02/05 A 0 200 1302.35 IF(ISNULL([2020/01/23]); [0] - [200];([1500.40] * (1 +([2020/02/05]-[2020/01/23]) * 0,0001)) + [0] - [200])
2020/02/22 A 1000 200 2104,56 IF(ISNULL([2020/02/05]); [1000] - [200];([1302.35] * (1 +([2020/02/22]-[2020/02/05]) * 0,0001)) + [1000] - [200])
2020/01/19 B 859.8 0 859.80 IF(ISNULL([NULL]); [859.8] - [0];([0] * (1 +([2020/01/19]-[NULL]) * 0,0001)) + [859.8] - [0])
2020/02/04 B 514.33 50 1325.51 IF(ISNULL([2020/01/19]); [514.33] - [50];([859.80] * (1 +([2020/02/04]-[2020/01/19]) * 0,0001)) + [514.33] - [50])
2020/02/25 B 1463.14 0 2791.43 IF(ISNULL([2020/02/04]); [1463.14] - [0];([1325.51] * (1 +([2020/02/25]-[2020/02/04]) * 0,0001)) + [1463.14] - [0])

I'm not able to accumulate the values to put in the formula.

I need to group the calculations by type, and accumulate the values to use in the current line. I'm new to python and I'm not able to implement it.


Solution

  • I assume that your dataframe is sorted by Type and Date

    # silencing chained assignment warning
    pd.options.mode.chained_assignment = None
    
    df['Movement'] = 0
    
    df_new = df
    for row in df.iterrows():
        x = row[1]
        y = df_new.shift().loc[x.name]
        if x['Type'] == y['Type']:
            # computing the given days compound interest should calculate using power
            compound_interest = pow(1.0001, pd.Timedelta(pd.to_datetime(x['Date']) - pd.to_datetime(y['Date'])).days)
            # if you wish to use your formula uncomment the following line
            # compound_interest = 1 + 0.0001 * pd.Timedelta(pd.to_datetime(x['Date']) - pd.to_datetime(y['Date'])).days
            df_new['Movement'].loc[x.name] = compound_interest * y['Movement'] + x['deposit'] - x['withdrawal']
        else:
            df_new['Movement'].loc[x.name] = x['deposit']
    
    print(df_new)
    

    OUTPUT

          Date Type  deposit  withdrawal    Movement
    2020/01/19    A  1000.00           0 1000.000000
    2020/01/23    A   500.00           0 1500.400060
    2020/02/05    A     0.00         200 1302.351751
    2020/02/22    A  1000.00         200 2104.567521
    2020/01/19    B   859.80           0  859.800000
    2020/02/04    B   514.33          50 1325.506712
    2020/02/25    B  1463.14           0 2791.433062