I have a time-series of returns in a pandas dataframe with a date index and return column labeled 'TRI', as such:
VALUE_DATE TRI
2007-06-26 -0.000727
2007-06-27 0.015004
2007-06-28 0.000758
2007-06-29 -0.006408
2007-07-02 0.013844
2007-07-03 0.003866
I'm trying to aggregate cumulative return over varying intervals using the pandas groupby
method. I've created a custom method:
def cumRets(z):
return np.exp(np.log(1 + z).cumsum()) - 1
But I'm not able use it successfully within the apply
call.
For reference, this groups (by year) as expected, albeit with a simple sum:
returns.groupby(returns.index.year).sum()
Output:
VALUE_DATE TRI
2007 -0.046283
2008 -0.240282
2009 0.259417
2010 0.268445
2011 0.054842
2012 0.162453
2013 0.331585
2014 0.063425
2015 -0.009367
2016 0.242511
2017 0.132732
2018 -0.099919
2019 0.233057
2020 -0.002414
But applying my cumRets
method, I'm getting all rows instead of the aggregated year totals. This call returns the correct values on the final day of each month, but returns every row leading up to it instead of just the specified group by value:
returns.groupby(returns.index.year).apply(cumRets)
Output:
VALUE_DATE TRI
2019-12-26 0.250672
2019-12-27 0.247278
2019-12-30 0.246734
2019-12-31 0.248562
2020-01-02 0.000143
2020-01-03 -0.002414
Desired output would look like this (abridged):
VALUE_DATE TRI
...
...
2019 0.248562
2020 -0.002414
Question 1a, is how I would do the same for monthly and weekly intervals, where desired output would look something like this, respectively (format only, values are placeholders):
Output, Monthly:
VALUE_DATE TRI
...
...
2019-12 0.066746
2020-01 -0.002414
Output, Weekly:
VALUE_DATE TRI
...
...
2019-12-w3 0.013228
2019-12-w4 0.022367
2020-01-w1 -0.002414
Here's a way you can try using resample
:
def cumRets(z):
return z.add(1).prod().sub(1)
# yearly
df.resample('Y', kind='period').apply(cumRets)
# monthly
df.resample('M', kind='period').apply(cumRets)