Search code examples
pythonpandasperformancefinance

Improve speed of finding ending value of portfolio after spending fixed rate


I have a pd.DataFrame of return series corresponding to years with a fixed spending rate of 5%. I am looking to find the ending portfolio value after spending for each year. val_after_spending in year t is equal to the average of year t val_before_spending with year t-1 val_after_spending times the spending rate. For the first year, the val_after_spending in t-1 is assumed to be 1.

I right now have a working implementation (below), but it is incredibly slow. Is there a faster way to implement this?

import pandas as pd
import numpy as np   
port_rets = pd.DataFrame({'port_ret': [.10,-.25,.15]})

spending_rate = .05

for index, row in port_rets.iterrows():
    if index != 0:
        port_rets.at[index, 'val_before_spending'] = port_rets['val_after_spending'][index - 1] * (1 + port_rets['port_ret'][index])
        port_rets.at[index, 'spending'] = np.mean([port_rets['val_after_spending'][index - 1], port_rets['val_before_spending'][index]]) * spending_rate 
    else:
        port_rets.at[index, 'val_before_spending'] = 1 * (1 + port_rets['port_ret'][index])
        port_rets.at[index, 'spending'] = np.mean([1, port_rets['val_before_spending'][index]]) * spending_rate

    port_rets.at[index, 'val_after_spending'] = port_rets['val_before_spending'][index] - port_rets['spending'][index]

#   port_ret    val_before_spending spending    val_after_spending
#0  0.100000    1.100000            0.052500    1.047500
#1  -0.250000   0.785625            0.045828    0.739797
#2  0.150000    0.850766            0.039764    0.811002

Solution

  • You very heavily interface with pandas in your code, which seems to be a bad idea as far as performance is concerned. To make it as easy to use as it is, pandas needs to do a lot of book keeping, which leads to reduced performance.

    We do all the calculation in numpy and then having got all the building blocks, build the dataframe in the end. Thus, the code translates to :

    def get_vals(rates, spending_rate):
        n = len(rates)
        vals_after_spending = np.zeros((n+1, ))
        vals_before_spending = np.zeros((n+1, ))
    
        vals_after_spending[0] = 1.0
    
        for i in range(n):
            vals_before_spending[i+1] = vals_after_spending[i] * (1 + rates[i])
    
            spending = np.mean(np.array([vals_after_spending[i], vals_before_spending[i+1]])) * spending_rate
            vals_after_spending[i+1] = vals_before_spending[i+1] - spending
    
        return vals_before_spending[1:], vals_after_spending[1:]
    
    rates = np.array(port_rets["port_ret"].tolist())
    vals_before_spending, vals_after_spending = get_vals(rates, spending_rate)
    port_rets = pd.DataFrame({'port_ret': rates, "val_before_spending": vals_before_spending, "val_after_spending": vals_after_spending})
    

    We can further improve by JIT compiling the code, as python loops are slow. Below I use numba :

    import numba as nb
    @nb.njit(cache=True)  # as easy as putting this decorator
    def get_vals(rates, spending_rate):
        n = len(rates)
        vals_after_spending = np.zeros((n+1, ))
        vals_before_spending = np.zeros((n+1, ))
    
        # ... code remains same, we are just compiling the function
    

    If we consider a random list of rates like this :

    port_rets = pd.DataFrame({'port_ret': np.random.uniform(low=-1.0, high=1.0, size=(100000,))})
    

    We get the performance comparisons:

    Your code : 15.758s

    get_vals : 1.407s

    JITed get_vals : 0.093s (on second run to discount for compile time)