Search code examples
pythonpandasdataframerelativedelta

Python: Sum NAs with relativedelta


I have 2 columns in a dataset of 327 records:

 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----               
 0   JD                         327 non-null    datetime64[ns]       
 1   CD                         312 non-null    Int64

And I want to generate a third one (['theoretical_eoc']) that gives me the dates kept in [JD] plus a number of months specified in [CD]. But when I define this new column by using:

df['theoretical_eoc'] = turnover.apply(lambda x: x.JD + relativedelta(months=x.CD), axis=1)

I receive the following error message:

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'

So, I defined a function to put NaT in case one of the the values in any of the columns is a NA:

def rd_na(a, b):
    if pd.isnull(a) or pd.isnull(b):
        pd.NaT
    else:
        a + relativedelta(months = b)

But when I apply it:

df['theoretical_eoc'] = turnover.apply(lambda x: rd_na(x.JD, x.CD), axis=1)

The result is a column full of None values, when I was expecting datetime64[ns] with some NaT. What am I doing wrong? How could I accomplish this task?


Solution

  • You are missing the returns in the rd_na function

    def rd_na(a, b):
        if pd.isnull(a) or pd.isnull(b):
            return pd.NaT
        else:
            return a + relativedelta(months = b)
    

    Consider using DateOffset from pandas as it handles pd.NaT

    from pandas.tseries.offsets import DateOffset
    
    df['theoretical_eoc'] = turnover.apply(lambda x: x.JD +
                                           DateOffset(months=x.CD), axis=1)