Search code examples
pythonpandaspandas-groupbypandas-apply

Python pandas groupby returning all entries rather than grouped entries


I have a time-series of returns in a pandas dataframe with a date index and return column labeled 'TRI', as such:

VALUE_DATE  TRI        
2007-06-26 -0.000727
2007-06-27  0.015004
2007-06-28  0.000758
2007-06-29 -0.006408
2007-07-02  0.013844
2007-07-03  0.003866

I'm trying to aggregate cumulative return over varying intervals using the pandas groupby method. I've created a custom method:

def cumRets(z):
        return np.exp(np.log(1 + z).cumsum()) - 1

But I'm not able use it successfully within the apply call.

For reference, this groups (by year) as expected, albeit with a simple sum:

returns.groupby(returns.index.year).sum()

Output:

VALUE_DATE  TRI        
2007       -0.046283
2008       -0.240282
2009        0.259417
2010        0.268445
2011        0.054842
2012        0.162453
2013        0.331585
2014        0.063425
2015       -0.009367
2016        0.242511
2017        0.132732
2018       -0.099919
2019        0.233057
2020       -0.002414

But applying my cumRets method, I'm getting all rows instead of the aggregated year totals. This call returns the correct values on the final day of each month, but returns every row leading up to it instead of just the specified group by value:

returns.groupby(returns.index.year).apply(cumRets) 

Output:

VALUE_DATE  TRI
2019-12-26  0.250672
2019-12-27  0.247278
2019-12-30  0.246734
2019-12-31  0.248562
2020-01-02  0.000143
2020-01-03 -0.002414

Desired output would look like this (abridged):

VALUE_DATE  TRI        
...
...
2019        0.248562
2020       -0.002414

Question 1a, is how I would do the same for monthly and weekly intervals, where desired output would look something like this, respectively (format only, values are placeholders):

Output, Monthly:

VALUE_DATE  TRI
...
...
    2019-12  0.066746
    2020-01 -0.002414

Output, Weekly:

VALUE_DATE  TRI
...
...
    2019-12-w3  0.013228
    2019-12-w4  0.022367
    2020-01-w1 -0.002414

Solution

  • Here's a way you can try using resample:

    def cumRets(z):
            return z.add(1).prod().sub(1)
    
    
    # yearly
    df.resample('Y', kind='period').apply(cumRets)
    
    # monthly
    df.resample('M', kind='period').apply(cumRets)