Search code examples
pythonpandaspandas-groupbypython-datetimequantitative-finance

Pandas P&L rollup to the next business day


I'm having a hard time trying to do this efficiently. I have some stocks and daily P&L info in a dataframe. In reality, I have millions of rows of data so efficiency matters a lot! The Dataframe looks like :

-------------------------------
| Date       | Security | P&L |
-------------------------------
| 2016-01-01 | AAPL     | 100 |
-------------------------------
| 2016-01-02 | AAPL     | 200 |
-------------------------------
| 2016-01-03 | AAPL     | 300 |
-------------------------------
| 2016-01-04 | AAPL     | -200 |
-------------------------------

All, I want to do is roll the P&L over to the next business day (exclude all US holidays and weekends) So, the resultant Dataframe looks like this:

-------------------------------
| Date       | Security | P&L |
-------------------------------
| 2016-01-04 | AAPL     | 400 |
-------------------------------

I'm looking for an efficient way to achieve this. I do have thousands of securities and over 5 yrs of data to process so brute force can't work, unfortunately!

Thanks in advance and highly appreciate any pointers on this!


Solution

  • We can create the DataFrame of business dates then merge_asof. Then we can group on this to get the sums.

    import pandas as pd
    from pandas.tseries.holiday import USFederalHolidayCalendar
    
    #df['Date'] = pd.to_datetime(df.Date)
    date_min = '2015-01-01'
    date_max = '2016-12-31'
    
    cal = USFederalHolidayCalendar()
    holidays = cal.holidays(date_min, date_max).tolist()
    df2 = pd.DataFrame({'bdate': pd.bdate_range(date_min, date_max, 
                                                holidays=holidays, freq='C')})
    
    res = pd.merge_asof(df, df2, left_on='Date', right_on='bdate', direction='forward')
    #        Date Security  P&L      bdate
    #0 2016-01-01     AAPL  100 2016-01-04
    #1 2016-01-02     AAPL  200 2016-01-04
    #2 2016-01-03     AAPL  300 2016-01-04
    #3 2016-01-04     AAPL -200 2016-01-04
    
    res.groupby(['Security', 'bdate'])['P&L'].sum()
    #Security  bdate     
    #AAPL      2016-01-04    400