Search code examples
pythonpandasfor-loop

Incremental computation of total value over dates and groups


Here is my dataframe. There is a date index and there are 4 symbols for each date. I want to loop over each date for each symbol. The 'quantity' column is calculated based on the 'tot_value' of the previous date. The 'tot_value' is computed for a specific date and is common for all symbols. The 'value' column varies for each symbol for each date.

It is an issue with the way I am using shift here. It does not reference the previous date value. Instead it uses the default value of tot_value that I use while populating the dataframe. However, in the final result the tot_value is getting computed correctly.

I am new to python and would appreciate any help with this loop.

dataframe

Here is my code.

import pandas as pd

# create the dataframe 
data = {'symbol': ['A', 'B', 'C', 'D','A', 'B', 'C', 'D','A', 'B', 'C', 'D','A', 'B', 'C', 'D'],
        'date':['05/06/2024','05/06/2024','05/06/2024','05/06/2024',
                               '05/07/2024','05/07/2024','05/07/2024','05/07/2024',
                               '05/08/2024','05/08/2024','05/08/2024','05/08/2024',
                               '05/09/2024','05/09/2024','05/09/2024','05/09/2024'],
       'tot_value': [1000, 1000, 1000, 1000,1000, 1000, 1000, 1000,1000, 1000, 1000, 1000,1000, 1000, 1000, 1000],
       'mult': [1, 1.1, 1.2, 1.3,1.4, 1.5, 1.6, 1.7,1.8, 1.9, 2, 2.1,2.2, 2.3, 2.4, 2.5],
       'quantity': [0, 0, 0, 0,0, 0, 0, 0,0, 0, 0, 0,0, 0, 0, 0],
       'value': [0, 0, 0, 0,0, 0, 0, 0,0, 0, 0, 0,0, 0, 0, 0],
       }

df = pd.DataFrame(data)
df.set_index(['date'], inplace = True)
symbols = df['symbol'].unique()

# loop by date index and symbol
for ind in df.index.unique():
    for symbol in symbols:
        df['quantity'][ind] = df['tot_value'][ind].shift(1) * df['mult'][ind]
        df['value'][ind] = df['quantity'][ind] * 5

    g = df.groupby('date')['value'].sum()
    df['tot_value'][ind] = g.sum()
    
df

Here is the expected output. Calculations:

date = 5/6 There is no tot_value for a previous date so the quantity column is NaN. Hence the value is also NaN. And tot_value = default value of 1,000.

date = 5/7 tot_value for previous date = 1,000. The quantity for this date is based on the tot_value of 1,000. Once the quantity is calculated the value calc is straightforward. tot_value for 5/7 = tot_value for 5/6 + sum of value for the 4 symbols on 5/7. tot_value for 5/7 = 1,000 + sum(7,000 + 7,500 + 8,000 + 8,500) = 32,000

date = 5/8 tot_value for previous date = 32,000. The quantity for this date is based on the tot_value of 32,000. Once the quantity is calculated the value calc is straightforward. tot_value for 5/8 = tot_value for 5/7 + sum of value for the 4 symbols on 5/8. tot_value for 5/8 = 32,000 + sum(288,000 + 304,000 + 320,000 + 336,000) = 1,280,000.

symbol tot_value (expected) mult quantity (expected) value (expected)
date
5/6/2024 A 1,000 1 NaN NaN
5/6/2024 B 1,000 1.1 NaN NaN
5/6/2024 C 1,000 1.2 NaN NaN
5/6/2024 D 1,000 1.3 NaN NaN
5/7/2024 A 32,000 1.4 1,400 7,000
5/7/2024 B 32,000 1.5 1,500 7,500
5/7/2024 C 32,000 1.6 1,600 8,000
5/7/2024 D 32,000 1.7 1,700 8,500
5/8/2024 A 1,280,000 1.8 57,600 288,000
5/8/2024 B 1,280,000 1.9 60,800 304,000
5/8/2024 C 1,280,000 2 64,000 320,000
5/8/2024 D 1,280,000 2.1 67,200 336,000
5/9/2024 A 61,440,000 2.2 2,816,000 14,080,000
5/9/2024 B 61,440,000 2.3 2,944,000 14,720,000
5/9/2024 C 61,440,000 2.4 3,072,000 15,360,000
5/9/2024 D 61,440,000 2.5 3,200,000 16,000,000

expected output


Solution

  • Your computation is inherently iterative, therefore a loop is a valid approach.

    This is however not the classical type of pandas operation (usually vectorized).

    One option, assuming the data is sorted by date, then symbol, would be to loop over a groupby (ignoring the first date):

    # initial total
    tot = df['tot_value'].iloc[0]
    
    dates = df.index.unique()
    
    for d, g in df.loc[dates[1]:].groupby('date', sort=False):
        qty = tot*g['mult']
        val = qty*5
        tot += val.sum()
        df.loc[d, 'quantity'] = qty
        df.loc[d, 'value'] = val
        df.loc[d, 'tot_value'] = tot
    

    Output:

               symbol  tot_value  mult  quantity     value
    date                                                  
    05/06/2024      A       1000   1.0         0         0
    05/06/2024      B       1000   1.1         0         0
    05/06/2024      C       1000   1.2         0         0
    05/06/2024      D       1000   1.3         0         0
    05/07/2024      A      32000   1.4      1400      7000
    05/07/2024      B      32000   1.5      1500      7500
    05/07/2024      C      32000   1.6      1600      8000
    05/07/2024      D      32000   1.7      1700      8500
    05/08/2024      A    1280000   1.8     57600    288000
    05/08/2024      B    1280000   1.9     60800    304000
    05/08/2024      C    1280000   2.0     64000    320000
    05/08/2024      D    1280000   2.1     67200    336000
    05/09/2024      A   61440000   2.2   2816000  14080000
    05/09/2024      B   61440000   2.3   2944000  14720000
    05/09/2024      C   61440000   2.4   3072000  15360000
    05/09/2024      D   61440000   2.5   3200000  16000000