Search code examples
pythonpandasportfoliostocks

Running total per category for each day in pandas dataframe


I have a pandas dataframe with stock transactions which do not happen every day and not for each stock:

Goal is to get the (daily) weights of each stock for each day.

Starting table and expected result

This means - creating a full calendar of dates - repeating the cumulative shares for each stock on each date - and finally calculate the weight for this date

Would somone be able to help me on this? I was already searching through several threads but I am not able to find any working solution.


Solution

  • Awesome! This inspired me finding a way to solve the issue! The issue in your solution was that if a stock D appears (added in below set) in the initial dataset it won't work anymore.

    I was able to solve this with the following:

    import pandas as pd
    import datetime
    
    # create df // build data // adding date as column
    trades = pd.DataFrame()
    trades['Date'] = pd.to_datetime(['2011-02-16', '2011-02-16', '2011-02-17', '2014-03-20','2014-03-20', '2018-01-04', '2011-02-18'])
    trades['stock'] = ['A', 'B', 'A', 'B', 'C', 'B', 'D']
    trades['shares_Tr'] = [5,10,5,10,15,-20,5]
    
    # create a range of dates for the merged dataframe
    index_of_dates = pd.date_range('2011-02-10', pd.datetime.today()).to_frame().reset_index(drop=True).rename(columns={0: 'Date'})
    
    # create a merged dataframe with columns date / stock / stock_Tr. 
    merged = pd.merge(index_of_dates,trades,how='left', on='Date')
    
    # create a pivottable showing the shares_TR of each stock for each date 
    shares_tr = merged.pivot(index='Date', columns='stock', values='shares_Tr').dropna(axis=1, how='all').fillna(0)
    
    # calculate individual pivottables for the cumsum and weights 
    cumShares = shares_tr.cumsum()
    weights = ((cumShares.T / cumShares.T.sum()).T*100).round(2)
    
    # finally combine all data into one dataframe
    all_data = pd.concat([shares_tr, cumShares, weights], axis=1, keys=['Shares','cumShares', 'Weights'])
    all_data