Search code examples
pythonpandasdatetimedate-range

How do I compensate holidays - pandas


Let's say I have a data frame as follows:

df = pd.DataFrame({'Ending Date': [Timestamp('2019-12-08 00:00:00'), Timestamp('2019-12-08 00:00:00')], 'FName': ['Jon', 'Bob'], 'LName': ['Doe', 'Smith'], 'Starting Date': ['2019-09-29', '2019-09-29']})

  Ending Date FName  LName Starting Date
0  2019-12-07   Jon    Doe    2019-09-28
1  2019-12-07   Bob  Smith    2019-09-28

As you can see, the Ending Date columns is always 10 weeks ahead of the Starting Date, however, I have a list of holidays:

holidays = pd.Series([Timestamp('2019-09-14 00:00:00'), Timestamp('2019-10-05 00:00:00'), Timestamp('2019-10-12 00:00:00'), Timestamp('2019-10-26 00:00:00'), Timestamp('2019-12-21 00:00:00'), Timestamp('2019-12-28 00:00:00'), Timestamp('2020-01-04 00:00:00'), Timestamp('2020-01-25 00:00:00'), Timestamp('2020-02-01 00:00:00'), Timestamp('2020-02-29 00:00:00'), Timestamp('2020-04-04 00:00:00'), Timestamp('2020-05-02 00:00:00')])

Therefore, I want to "compensate" the holidays, so I want to get every Saturday of the range between the Starting Date and the Ending Date that's in the holiday Series to be counted, and add the n (the count) weeks to the Ending Date, and, if any of the added weeks are holidays, compensate them as well, and so on...

I tried:

df['Ending Date'] = df['Ending Date'] + pd.Timedelta(weeks=10 + pd.date_range(df['Starting Date'], df['Ending Date']).isin(holidays).sum())

But an error of:

TypeError: Cannot convert input [0    2019-09-28
1    2019-09-28
Name: Starting Date, dtype: object] of type <class 'pandas.core.series.Series'> to Timestamp

Is raised.

Desired output:

  Ending Date FName  LName Starting Date
0  2020-01-18   Jon    Doe    2019-09-28
1  2020-01-18   Bob  Smith    2019-09-28

Solution

  • I assume that both Starting Date and Ending Date should be of datetime64[ns] type. If they are not, convert them with pd.to_datetime.

    I noticed that you use only Saturday dates, so your case is something like we had a business week including only one working day in a calendar week, namely only Saturday.

    Then, to do your task, it is quite easy if we make use of Custom Business Calendar, with user defined holiday dates.

    Start with defining CustomBusinessDay offset, including your hoilday list:

    my_bday = pd.offsets.CustomBusinessDay(holidays=holidays, weekmask='Sat')
    

    Then, to compute the date n business days (actually - also weeks) ahead from a given date dat, we should use formula: dat + 10 * my_bday.

    So in your case (source data in Starting Date column and the result to be saved in Ending Date), run:

    df['Ending Date'] = df['Starting Date'].apply(lambda dat: dat + 10 * my_bday)