Here is my dataframe. There is a date index and there are 4 symbols for each date. I want to loop over each date for each symbol. The 'quantity' column is calculated based on the 'tot_value' of the previous date. The 'tot_value' is computed for a specific date and is common for all symbols. The 'value' column varies for each symbol for each date.
It is an issue with the way I am using shift here. It does not reference the previous date value. Instead it uses the default value of tot_value that I use while populating the dataframe. However, in the final result the tot_value is getting computed correctly.
I am new to python and would appreciate any help with this loop.
Here is my code.
import pandas as pd
# create the dataframe
data = {'symbol': ['A', 'B', 'C', 'D','A', 'B', 'C', 'D','A', 'B', 'C', 'D','A', 'B', 'C', 'D'],
'date':['05/06/2024','05/06/2024','05/06/2024','05/06/2024',
'05/07/2024','05/07/2024','05/07/2024','05/07/2024',
'05/08/2024','05/08/2024','05/08/2024','05/08/2024',
'05/09/2024','05/09/2024','05/09/2024','05/09/2024'],
'tot_value': [1000, 1000, 1000, 1000,1000, 1000, 1000, 1000,1000, 1000, 1000, 1000,1000, 1000, 1000, 1000],
'mult': [1, 1.1, 1.2, 1.3,1.4, 1.5, 1.6, 1.7,1.8, 1.9, 2, 2.1,2.2, 2.3, 2.4, 2.5],
'quantity': [0, 0, 0, 0,0, 0, 0, 0,0, 0, 0, 0,0, 0, 0, 0],
'value': [0, 0, 0, 0,0, 0, 0, 0,0, 0, 0, 0,0, 0, 0, 0],
}
df = pd.DataFrame(data)
df.set_index(['date'], inplace = True)
symbols = df['symbol'].unique()
# loop by date index and symbol
for ind in df.index.unique():
for symbol in symbols:
df['quantity'][ind] = df['tot_value'][ind].shift(1) * df['mult'][ind]
df['value'][ind] = df['quantity'][ind] * 5
g = df.groupby('date')['value'].sum()
df['tot_value'][ind] = g.sum()
df
Here is the expected output. Calculations:
date = 5/6 There is no tot_value for a previous date so the quantity column is NaN. Hence the value is also NaN. And tot_value = default value of 1,000.
date = 5/7 tot_value for previous date = 1,000. The quantity for this date is based on the tot_value of 1,000. Once the quantity is calculated the value calc is straightforward. tot_value for 5/7 = tot_value for 5/6 + sum of value for the 4 symbols on 5/7. tot_value for 5/7 = 1,000 + sum(7,000 + 7,500 + 8,000 + 8,500) = 32,000
date = 5/8 tot_value for previous date = 32,000. The quantity for this date is based on the tot_value of 32,000. Once the quantity is calculated the value calc is straightforward. tot_value for 5/8 = tot_value for 5/7 + sum of value for the 4 symbols on 5/8. tot_value for 5/8 = 32,000 + sum(288,000 + 304,000 + 320,000 + 336,000) = 1,280,000.
symbol | tot_value (expected) | mult | quantity (expected) | value (expected) | |
---|---|---|---|---|---|
date | |||||
5/6/2024 | A | 1,000 | 1 | NaN | NaN |
5/6/2024 | B | 1,000 | 1.1 | NaN | NaN |
5/6/2024 | C | 1,000 | 1.2 | NaN | NaN |
5/6/2024 | D | 1,000 | 1.3 | NaN | NaN |
5/7/2024 | A | 32,000 | 1.4 | 1,400 | 7,000 |
5/7/2024 | B | 32,000 | 1.5 | 1,500 | 7,500 |
5/7/2024 | C | 32,000 | 1.6 | 1,600 | 8,000 |
5/7/2024 | D | 32,000 | 1.7 | 1,700 | 8,500 |
5/8/2024 | A | 1,280,000 | 1.8 | 57,600 | 288,000 |
5/8/2024 | B | 1,280,000 | 1.9 | 60,800 | 304,000 |
5/8/2024 | C | 1,280,000 | 2 | 64,000 | 320,000 |
5/8/2024 | D | 1,280,000 | 2.1 | 67,200 | 336,000 |
5/9/2024 | A | 61,440,000 | 2.2 | 2,816,000 | 14,080,000 |
5/9/2024 | B | 61,440,000 | 2.3 | 2,944,000 | 14,720,000 |
5/9/2024 | C | 61,440,000 | 2.4 | 3,072,000 | 15,360,000 |
5/9/2024 | D | 61,440,000 | 2.5 | 3,200,000 | 16,000,000 |
Your computation is inherently iterative, therefore a loop is a valid approach.
This is however not the classical type of pandas operation (usually vectorized).
One option, assuming the data is sorted by date, then symbol, would be to loop over a groupby
(ignoring the first date):
# initial total
tot = df['tot_value'].iloc[0]
dates = df.index.unique()
for d, g in df.loc[dates[1]:].groupby('date', sort=False):
qty = tot*g['mult']
val = qty*5
tot += val.sum()
df.loc[d, 'quantity'] = qty
df.loc[d, 'value'] = val
df.loc[d, 'tot_value'] = tot
Output:
symbol tot_value mult quantity value
date
05/06/2024 A 1000 1.0 0 0
05/06/2024 B 1000 1.1 0 0
05/06/2024 C 1000 1.2 0 0
05/06/2024 D 1000 1.3 0 0
05/07/2024 A 32000 1.4 1400 7000
05/07/2024 B 32000 1.5 1500 7500
05/07/2024 C 32000 1.6 1600 8000
05/07/2024 D 32000 1.7 1700 8500
05/08/2024 A 1280000 1.8 57600 288000
05/08/2024 B 1280000 1.9 60800 304000
05/08/2024 C 1280000 2.0 64000 320000
05/08/2024 D 1280000 2.1 67200 336000
05/09/2024 A 61440000 2.2 2816000 14080000
05/09/2024 B 61440000 2.3 2944000 14720000
05/09/2024 C 61440000 2.4 3072000 15360000
05/09/2024 D 61440000 2.5 3200000 16000000