Search code examples
pythonpandasdatedatetimetimedelta

Python Timedelta[M] adds incomplete days


I have a table that has a column Months_since_Start_fin_year and a Date column. I need to add the number of months in the first column to the date in the second column.

DateTable['Date']=DateTable['First_month']+DateTable['Months_since_Start_fin_year'].astype("timedelta64[M]")

This works OK for month 0, but month 1 already has a different time and for month 2 onwards has the wrong date. Image of output table where early months have the correct date but month 2 where I would expect June 1st actually shows May 31st It must be adding incomplete months, but I'm not sure how to fix it?

I have also tried

DateTable['Date']=DateTable['First_month']+relativedelta(months=DateTable['Months_since_Start_fin_year'])

but I get a type error that says

TypeError: cannot convert the series to <class 'int'>

My Months_since_Start_fin_year is type int32 and my First_month variable is datetime64[ns]


Solution

  • The problem with adding months as an offset to a date is that not all months are equally long (28-31 days). So you need pd.DateOffset which handles that ambiguity for you. .astype("timedelta64[M]") on the other hand only gives you the average days per month within a year (30 days 10:29:06).

    Ex:

    import pandas as pd
    
    # a synthetic example since you didn't provide a mre
    df = pd.DataFrame({'start_date': 7*['2017-04-01'],
                       'month_offset': range(7)})
    
    # make sure we have datetime dtype
    df['start_date'] = pd.to_datetime(df['start_date'])
    
    # add month offset
    df['new_date'] = df.apply(lambda row: row['start_date'] + 
                                          pd.DateOffset(months=row['month_offset']),
                                          axis=1)
    

    which would give you e.g.

    df
      start_date  month_offset   new_date
    0 2017-04-01             0 2017-04-01
    1 2017-04-01             1 2017-05-01
    2 2017-04-01             2 2017-06-01
    3 2017-04-01             3 2017-07-01
    4 2017-04-01             4 2017-08-01
    5 2017-04-01             5 2017-09-01
    6 2017-04-01             6 2017-10-01
    

    You can find similar examples here on SO, e.g. Add months to a date in Pandas. I only modified the answer there by using an apply to be able to take the months offset from one of the DataFrame's columns.