Update: not sure if this is possible without some form of a loop, but np.where
will not work here. If the answer is, "you can't", then so be it. If it can be done, it may use something from scipy.signal
.
I'd like to vectorize the loop in the code below, but unsure as to how, due to the recursive nature of the output.
Walk-though of my current setup:
Take a starting amount ($1 million) and a quarterly dollar distribution ($5,000):
dist = 5000.
v0 = float(1e6)
Generate some random security/account returns (decimal form) at monthly freq:
r = pd.Series(np.random.rand(12) * .01,
index=pd.date_range('2017', freq='M', periods=12))
Create an empty Series that will hold the monthly account values:
value = pd.Series(np.empty_like(r), index=r.index)
Add a "start month" to value
. This label will contain v0
.
from pandas.tseries import offsets
value = (value.append(Series(v0, index=[value.index[0] - offsets.MonthEnd(1)]))
.sort_index())
The loop I'd like to get rid of is here:
for date in value.index[1:]:
if date.is_quarter_end:
value.loc[date] = value.loc[date - offsets.MonthEnd(1)] \
* (1 + r.loc[date]) - dist
else:
value.loc[date] = value.loc[date - offsets.MonthEnd(1)] \
* (1 + r.loc[date])
Combined code:
import pandas as pd
from pandas.tseries import offsets
from pandas import Series
import numpy as np
dist = 5000.
v0 = float(1e6)
r = pd.Series(np.random.rand(12) * .01, index=pd.date_range('2017', freq='M', periods=12))
value = pd.Series(np.empty_like(r), index=r.index)
value = (value.append(Series(v0, index=[value.index[0] - offsets.MonthEnd(1)])).sort_index())
for date in value.index[1:]:
if date.is_quarter_end:
value.loc[date] = value.loc[date - offsets.MonthEnd(1)] * (1 + r.loc[date]) - dist
else:
value.loc[date] = value.loc[date - offsets.MonthEnd(1)] * (1 + r.loc[date])
In psuedocode, what is loop is doing is just:
for each date in index of value:
if the date is not a quarter end:
multiply previous value by (1 + r) for that month
if the date is a quarter end:
multiply previous value by (1 + r) for that month and subtract dist
The issue is, I don't currently see how vectorization is possible since the successive value depends on whether or not a distribution was taken in the month prior. I get to the desired result, but pretty inefficiently for higher frequency data or larger time periods.
You could use the following code:
cum_r = (1 + r).cumprod()
result = cum_r * v0
for date in r.index[r.index.is_quarter_end]:
result[date:] -= cum_r[date:] * (dist / cum_r.loc[date])
You would make:
v0
n
vector multiplication with scalar dist / cum_r.loc[date]
n
vector subtractionswhere n
is the number of quarter ends.
Based on this code we can optimize further:
cum_r = (1 + r).cumprod()
t = (r.index.is_quarter_end / cum_r).cumsum()
result = cum_r * (v0 - dist * t)
which is
(1 + r).cumprod()
r.index.is_quarter_end / cum_r
dist
v0
with dist * t
cum_r
with v0 - dist * t