I have a pd.DataFrame
of return series corresponding to years with a fixed spending rate of 5%. I am looking to find the ending portfolio value after spending for each year. val_after_spending
in year t
is equal to the average of year t
val_before_spending
with year t-1
val_after_spending times the spending rate. For the first year, the val_after_spending
in t-1
is assumed to be 1.
I right now have a working implementation (below), but it is incredibly slow. Is there a faster way to implement this?
import pandas as pd
import numpy as np
port_rets = pd.DataFrame({'port_ret': [.10,-.25,.15]})
spending_rate = .05
for index, row in port_rets.iterrows():
if index != 0:
port_rets.at[index, 'val_before_spending'] = port_rets['val_after_spending'][index - 1] * (1 + port_rets['port_ret'][index])
port_rets.at[index, 'spending'] = np.mean([port_rets['val_after_spending'][index - 1], port_rets['val_before_spending'][index]]) * spending_rate
else:
port_rets.at[index, 'val_before_spending'] = 1 * (1 + port_rets['port_ret'][index])
port_rets.at[index, 'spending'] = np.mean([1, port_rets['val_before_spending'][index]]) * spending_rate
port_rets.at[index, 'val_after_spending'] = port_rets['val_before_spending'][index] - port_rets['spending'][index]
# port_ret val_before_spending spending val_after_spending
#0 0.100000 1.100000 0.052500 1.047500
#1 -0.250000 0.785625 0.045828 0.739797
#2 0.150000 0.850766 0.039764 0.811002
You very heavily interface with pandas in your code, which seems to be a bad idea as far as performance is concerned. To make it as easy to use as it is, pandas needs to do a lot of book keeping, which leads to reduced performance.
We do all the calculation in numpy and then having got all the building blocks, build the dataframe in the end. Thus, the code translates to :
def get_vals(rates, spending_rate):
n = len(rates)
vals_after_spending = np.zeros((n+1, ))
vals_before_spending = np.zeros((n+1, ))
vals_after_spending[0] = 1.0
for i in range(n):
vals_before_spending[i+1] = vals_after_spending[i] * (1 + rates[i])
spending = np.mean(np.array([vals_after_spending[i], vals_before_spending[i+1]])) * spending_rate
vals_after_spending[i+1] = vals_before_spending[i+1] - spending
return vals_before_spending[1:], vals_after_spending[1:]
rates = np.array(port_rets["port_ret"].tolist())
vals_before_spending, vals_after_spending = get_vals(rates, spending_rate)
port_rets = pd.DataFrame({'port_ret': rates, "val_before_spending": vals_before_spending, "val_after_spending": vals_after_spending})
We can further improve by JIT compiling the code, as python loops are slow. Below I use numba :
import numba as nb
@nb.njit(cache=True) # as easy as putting this decorator
def get_vals(rates, spending_rate):
n = len(rates)
vals_after_spending = np.zeros((n+1, ))
vals_before_spending = np.zeros((n+1, ))
# ... code remains same, we are just compiling the function
If we consider a random list of rates like this :
port_rets = pd.DataFrame({'port_ret': np.random.uniform(low=-1.0, high=1.0, size=(100000,))})
We get the performance comparisons:
Your code : 15.758s
get_vals : 1.407s
JITed get_vals : 0.093s (on second run to discount for compile time)