I'm having a hard time trying to do this efficiently. I have some stocks and daily P&L info in a dataframe. In reality, I have millions of rows of data so efficiency matters a lot! The Dataframe looks like :
-------------------------------
| Date | Security | P&L |
-------------------------------
| 2016-01-01 | AAPL | 100 |
-------------------------------
| 2016-01-02 | AAPL | 200 |
-------------------------------
| 2016-01-03 | AAPL | 300 |
-------------------------------
| 2016-01-04 | AAPL | -200 |
-------------------------------
All, I want to do is roll the P&L over to the next business day (exclude all US holidays and weekends) So, the resultant Dataframe looks like this:
-------------------------------
| Date | Security | P&L |
-------------------------------
| 2016-01-04 | AAPL | 400 |
-------------------------------
I'm looking for an efficient way to achieve this. I do have thousands of securities and over 5 yrs of data to process so brute force can't work, unfortunately!
Thanks in advance and highly appreciate any pointers on this!
We can create the DataFrame of business dates then merge_asof
. Then we can group on this to get the sums.
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
#df['Date'] = pd.to_datetime(df.Date)
date_min = '2015-01-01'
date_max = '2016-12-31'
cal = USFederalHolidayCalendar()
holidays = cal.holidays(date_min, date_max).tolist()
df2 = pd.DataFrame({'bdate': pd.bdate_range(date_min, date_max,
holidays=holidays, freq='C')})
res = pd.merge_asof(df, df2, left_on='Date', right_on='bdate', direction='forward')
# Date Security P&L bdate
#0 2016-01-01 AAPL 100 2016-01-04
#1 2016-01-02 AAPL 200 2016-01-04
#2 2016-01-03 AAPL 300 2016-01-04
#3 2016-01-04 AAPL -200 2016-01-04
res.groupby(['Security', 'bdate'])['P&L'].sum()
#Security bdate
#AAPL 2016-01-04 400