Search code examples
pythonpandasdataframefinance

Summing and averaging for the same days


I have data which I sorted by their days in excel, what I want to do now is get the sum of the daily returns for each day. The problem here is that I have multiple entries for the days. So I might only have one Daily Return entry for 2018-12-05 but 5 entries for 2018-12-06. I would like that I only get one entry for 2018-12-06 with both the cumulative daily return (so all cumulative returns added together) and the average daily return (so the cumulative return divided by the amount of entries for the day. For 2018-12-06 this would be divided by 5).

So the data I have right now looks like this:

            Dates  Last.Price  Daily.Return
19788  2018-11-23       75.18     -0.001199
19789  2018-11-23      129.04     -0.026490
19790  2018-11-26       77.84     -0.035382
19791  2018-11-26      127.98      0.008215
19792  2018-11-27       79.50     -0.021326
19793  2018-11-27      122.68      0.041413
19794  2018-11-28       80.27     -0.009686
19795  2018-11-29       80.00      0.003364

The final Data Frame should look like this

              Dates  Last.Price  Cum.Return   Average.Return
19788  2018-11-23       75.18     -0.027689    -0.0138445
19790  2018-11-26       77.84     -0.027167    -0.0135835
19792  2018-11-27       79.50      0.020087     0.0100435
19794  2018-11-28       80.27     -0.009686    -0.009686
19795  2018-11-29       80.00      0.003364     0.003364

I have the following code so far to sum the daily returns. However it doesn't sum correctly. And I don't know how to implement the average daily return.

df = pd.read_csv('/Python Test/SP500Acquirer.csv')

def sum_from_days_prior(row, df):
    '''returns sum of values in row month, 
    from all dates in df prior to row date'''

    day = pd.to_datetime(row).day

    all_dates_prior = df[df.index <= row]
    same_day = all_dates_prior[all_dates_prior.index.day == day]

    return same_day["Daily.Return"].sum()


df.set_index('Dates', inplace = True)
df.index = pd.to_datetime(df.index)
df["Dates"] = df.index
df.sort_index(inplace = True)

df["Day"] = df["Dates"].apply(lambda row: sum_from_days_prior (row, df))
df.drop("Dates", axis = 1, inplace = True)

print(df.tail(20))

As said before this code does not sum the daily returns correctly. And I do not know how to get the average returns for the days.


Solution

  • I think you need aggregate by agg with functions first, sum and mean:

    Because column Daily.Return is aggregates by multiple functions defined in list, get MultiIndex in output. So is necessary flatten it - simpliest is use map with join.

    df = df.groupby('Dates').agg({'Last.Price':'first', 'Daily.Return':['mean','sum']})
    
    print (df)
               Last.Price Daily.Return          
                    first         mean       sum
    Dates                                       
    2018-11-23      75.18    -0.013844 -0.027689
    2018-11-26      77.84    -0.013583 -0.027167
    2018-11-27      79.50     0.010044  0.020087
    2018-11-28      80.27    -0.009686 -0.009686
    2018-11-29      80.00     0.003364  0.003364
    
    print (df.columns)
    MultiIndex(levels=[['Last.Price', 'Daily.Return'], ['first', 'mean', 'sum']],
               labels=[[0, 1, 1], [0, 1, 2]])
    

    df.columns = df.columns.map('_'.join)
    print (df)
               Last.Price_first  Daily.Return_mean  Daily.Return_sum
    Dates                                                            
    2018-11-23             75.18          -0.013844         -0.027689
    2018-11-26             77.84          -0.013583         -0.027167
    2018-11-27             79.50           0.010044          0.020087
    2018-11-28             80.27          -0.009686         -0.009686
    2018-11-29             80.00           0.003364          0.003364
    

    Last renamecolumns:

    d = {'Last.Price_first':'Last.Price',
         'Daily.Return_sum': 'Cum.Return',
         'Daily.Return_mean': 'Average.Return'}
    
    df = df.rename(columns=d)
    print (df)
                Last.Price  Average.Return  Cum.Return
    Dates                                             
    2018-11-23       75.18       -0.013844   -0.027689
    2018-11-26       77.84       -0.013583   -0.027167
    2018-11-27       79.50        0.010044    0.020087
    2018-11-28       80.27       -0.009686   -0.009686
    2018-11-29       80.00        0.003364    0.003364