I have 2 columns in a dataset of 327 records:
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 JD 327 non-null datetime64[ns]
1 CD 312 non-null Int64
And I want to generate a third one (['theoretical_eoc']
) that gives me the dates kept in [JD]
plus a number of months specified in [CD]
. But when I define this new column by using:
df['theoretical_eoc'] = turnover.apply(lambda x: x.JD + relativedelta(months=x.CD), axis=1)
I receive the following error message:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NAType'
So, I defined a function to put NaT
in case one of the the values in any of the columns is a NA:
def rd_na(a, b):
if pd.isnull(a) or pd.isnull(b):
pd.NaT
else:
a + relativedelta(months = b)
But when I apply it:
df['theoretical_eoc'] = turnover.apply(lambda x: rd_na(x.JD, x.CD), axis=1)
The result is a column full of None
values, when I was expecting datetime64[ns]
with some NaT
. What am I doing wrong? How could I accomplish this task?
You are missing the returns in the rd_na
function
def rd_na(a, b):
if pd.isnull(a) or pd.isnull(b):
return pd.NaT
else:
return a + relativedelta(months = b)
Consider using DateOffset from pandas as it handles pd.NaT
from pandas.tseries.offsets import DateOffset
df['theoretical_eoc'] = turnover.apply(lambda x: x.JD +
DateOffset(months=x.CD), axis=1)