I'm trying to calculate two cumulative sums given a series of financial transactions. There are 4 transactions types, each with an amount for the transaction: D - deposit W - withdraw G - gain L - loss
The data frame is created like so
import pandas as pd
import numpy as np
data = { 'Type': ['D', 'D', 'W', 'D', 'G', 'G', 'G', 'L', 'W', 'G', 'W', 'G', 'L' ],
'Amount': [10, 10, -5, 10, 5, 5, 5, -5, -10, 10, -25, 25, -30]
}
df = pd.DataFrame(data, columns = ['Type', 'Amount'])
The running Capital is easy to calculate using the cumsum(), which essentially includes all transactions.
df['Capital'] = df['Amount'].cumsum()
The other entity I want to calculate is the Principal, which represents the amount of money is input into the account. This only considers D and W type transactions. I can do a simple filter here using:
df['Principal'] = df.apply(lambda row : row['Amount'] if (row['Type'] == 'W' or row['Type'] == 'D') else 0, axis=1).cumsum()
However, this has a problem. When there are gains and there are withdraws, the withdrawal would need to withdraw from the gains before it effects that principal. The output of the above has the mistake in the result below (row 8 and 10):
Type Amount Capital Principal
0 D 10 10 10
1 D 10 20 20
2 W -5 15 15
3 D 10 25 25
4 G 5 30 25
5 G 5 35 25
6 G 5 40 25
7 L -5 35 25
8 W -10 25 15 <- should stays at 25
9 G 10 35 15 <- now wrong because of above
10 W -25 10 -10 <- error escalades
11 G 25 35 -10
12 L -30 5 -10
I am able to get the desire result by doing the following, but it seems a bit ugly. Wondering if there some easier or short hand way. I'd imagine this is a common calculation in finance.
df['Principal'] = np.nan
currentPrincipal = 0
for index, row in df.iterrows():
if (row['Type'] == 'D'):
#row['Principal'] = currentPrincipal + row['Amount']
df.loc[index, 'Principal'] = currentPrincipal + row['Amount']
elif (row['Type'] == 'W' and row['Capital'] <= currentPrincipal):
#row['Principal'] = row['Capital']
df.loc[index, 'Principal'] = row['Capital']
else:
df.loc[index, 'Principal'] = currentPrincipal
currentPrincipal = df.loc[index, 'Principal']
I was not successful in trying to use apply
as we have a dependency on the previous result of Principal that needs to be carried forward.
Correct result:
Type Amount Capital Principal
0 D 10 10 10
1 D 10 20 20
2 W -5 15 15
3 D 10 25 25
4 G 5 30 25
5 G 5 35 25
6 G 5 40 25
7 L -5 35 25
8 W -10 25 25
9 G 10 35 25
10 W -25 10 10
11 G 25 35 10
12 L -30 5 10
You can do:
# calculate cumulative withdrawals
w = df['Amount'].where(df['Type'].eq('W')).cumsum()
# calculate cumulative deposits
d = df['Amount'].where(df['Type'].eq('D'), 0).cumsum()
# calculate cumulative gain & loss
g = df['Amount'].where(df['Type'].isin(['G', 'L']), 0).cumsum()
# calculate principal = deposit + net_withdrawal(if any)
df['Principal'] = d + (g + w).where(lambda x: x < 0).ffill().fillna(0)
Result:
Type Amount Capital Principal
0 D 10 10 10.0
1 D 10 20 20.0
2 W -5 15 15.0
3 D 10 25 25.0
4 G 5 30 25.0
5 G 5 35 25.0
6 G 5 40 25.0
7 L -5 35 25.0
8 W -10 25 25.0
9 G 10 35 25.0
10 W -25 10 10.0
11 G 25 35 10.0
12 L -30 5 10.0