Search code examples
pythonpandasnumpydata-science

Pandas: Get Business Month-End Dates Correctly Even for Dates Already on Month-End


let's say i have and dataframe as below

date,ent_id,val
2021-03-23,101,61
2021-03-12,103,64
2021-03-15,101,32
2021-04-01,103,39
2021-04-02,101,71
2021-04-02,103,79
2021-04-30,101,51
2021-04-30,103,53
2021-05-31,101,28
2021-05-31,103,26
2021-05-31,101,47
2021-05-31,103,61
2021-06-06,101,45
2021-06-06,103,78
2021-06-07,101,23
2021-06-07,103,31
2021-07-31,101,14
2021-07-31,103,02
2021-07-31,101,82
2021-07-31,103,15

i want to create an addition column in dataframe which contain month end date based of following condition

case 
when DAYNAME('date')='Sunday' then days_add(date,-2)
when DAYNAME('date')='Saturday' then days_add(date,-1)
else date 

so the output would be like this

date,ent_id,val,month_end
2021-03-23,101,61,2021-03-31
2021-03-12,103,64,2021-03-31
2021-03-15,101,32,2021-03-31
2021-04-01,103,39,2021-04-30
2021-04-02,101,71,2021-04-30
2021-04-02,103,79,2021-04-30
2021-04-30,101,51,2021-04-30
2021-04-30,103,53,2021-04-30
2021-05-31,101,28,2021-05-31
2021-05-31,103,26,2021-05-31
2021-05-31,101,47,2021-05-31
2021-05-31,103,61,2021-05-31
2021-06-06,101,45,2021-06-30
2021-06-06,103,78,2021-06-30
2021-06-07,101,23,2021-06-30
2021-06-07,103,31,2021-06-30
2021-07-31,101,14,2021-07-31
2021-07-31,103,02,2021-07-31
2021-07-31,101,82,2021-07-31
2021-07-31,103,15,2021-07-31

my effort

import pandas as pd
from datetime import timedelta
from pandas.tseries.offsets import MonthEnd
import numpy as np

df.loc[(df['date']+MonthEnd(0)).dt.day_name()=='Sunday','month_end'] =[df.loc[(df['date']+MonthEnd(0)).dt.day_name()=='Sunday']['date']+timedelta(days=-2)]
df.loc[(df['date']+MonthEnd(0)).dt.day_name()=='Saturday','month_end'] =[df.loc[(df['date']+MonthEnd(0)).dt.day_name()=='Saturday']['date']+timedelta(days=-1)]

but getting this error

ValueError: Must have equal len keys and value when setting with an ndarray

any other better solution is most welcome


Solution

  • You can use pd.offsets.MonthEnd(n=0) and then adjust for Sunday and Saturday. Note that n=0 is needed or else 2021-07-31 will be rolled forward to 2021-08-31.

    df['month_end'] = df['date'] + pd.offsets.MonthEnd(n=0)
    df.loc[df['month_end'].dt.day_name() == 'Sunday', 'month_end'] -= pd.DateOffset(2)
    df.loc[df['month_end'].dt.day_name() == 'Saturday', 'month_end'] -= pd.DateOffset(1)
    
    
    print(df)
    
             date  ent_id  val  month_end
    0  2021-03-23     101   61 2021-03-31
    1  2021-03-12     103   64 2021-03-31
    2  2021-03-15     101   32 2021-03-31
    3  2021-04-01     103   39 2021-04-30
    4  2021-04-02     101   71 2021-04-30
    5  2021-04-02     103   79 2021-04-30
    6  2021-04-30     101   51 2021-04-30
    7  2021-04-30     103   53 2021-04-30
    8  2021-05-31     101   28 2021-05-31
    9  2021-05-31     103   26 2021-05-31
    10 2021-05-31     101   47 2021-05-31
    11 2021-05-31     103   61 2021-05-31
    12 2021-06-06     101   45 2021-06-30
    13 2021-06-06     103   78 2021-06-30
    14 2021-06-07     101   23 2021-06-30
    15 2021-06-07     103   31 2021-06-30
    16 2021-07-31     101   14 2021-07-30
    17 2021-07-31     103    2 2021-07-30
    18 2021-07-31     101   82 2021-07-30
    19 2021-07-31     103   15 2021-07-30
    

    Note that although you seems to want to get the last business date of a month, we cannot simply use pd.offsets.BMonthEnd(n=0) since 2021-07-31 will still be rolled forward to 2021-08-31.

    Edit: Solution using BMonthEnd

    When I first looked at the question, what immediately popped up to my mind is that probably we can make use of pd.offsets.BMonthEnd to solve the question.

    As I also know that for pd.offsets.MonthEnd, we can use parameter n=0 (or simply 0) to ensure the date is not rolled if on an anchor point (on an anchor point if the date is already on month-end date in this case). Hence, my initial trial is to use pd.offsets.BMonthEnd(n=0). To my surprise, it doesn't behave similarly as the MonthEnd counterpart. Date of 2021-07-31 is still rolled forward to 2021-08-31.

    As this kind of problem is very common, I would like to provide a workaround to enable us to still use BMonthEnd instead of providing codes to check and modify for Sunday and Saturday.

    Here's the workaround code to enable BMonthEnd to behave similarly as MonthEnd(n=0):

    df['business_month_end'] = df['date'] + pd.offsets.MonthEnd(0) - pd.offsets.MonthBegin() + pd.offsets.BMonthEnd()
    

    Here, MonthEnd(0) is the same as MonthEnd(n=0) while MonthEnd() and BMonthEnd() without passing parameter is the same as passing n=1 (the default).

    The mechanism is that we borrow the feature of MonthEnd(n=0) to keep on anchor even on anchor point and get the month-begin of that date (should be the first date within the same month) and then apply the BMonthEnd function to let it gets us the last business date of the same month (with adjustments if dates fall on Sunday & Saturday).

    Edit: Updated Solution using BMonthEnd

    Pandas provides a solution using BMonthEnd in the docs.

    If you want to get the end of the current business month:

    >>> ts = pd.Timestamp(2022, 11, 30)
    >>> pd.offsets.BMonthEnd().rollforward(ts)
    Timestamp('2022-11-30 00:00:00')