Search code examples
pythonpandasdatevectorization

np.vectorize and relativedelta returning "relativedelta only diffs datetime/date"


I have a pandas dataframe with two datetime64[ns] columns ("d1" and "d2") representing dates. I would like to create a third column calculated as the difference between these two dates. I can't use a simple days/365 style calculation, so I am requiring relativedelta.

Using relativedelta works fine on one row:

import dateutil.relativedelta as relativedelta
relativedelta.relativedelta(df["d1"][0],df["d2"][0])
> relativedelta(years=+1)

But it fails on columns. So I vectorize it:

date_diffs=np.vectorize(relativedelta.relativedelta)

And then I try

date_diffs(df["d1"],df["d2"])

But this returns TypeError: relativedelta only diffs datetime/date

How do I fix this? Or should I simply use the apply statement or a for-loop?


Solution

  • Use list comprehension:

    df = pd.DataFrame({'d1':pd.date_range('2000-01-05', periods=3),
                       'd2':pd.date_range('2006-08-05', periods=3, freq='35M')})
    
    from dateutil.relativedelta import relativedelta
    
    def date_diffs(s, e):
        return relativedelta(s,e)
    
    df['out'] = [date_diffs(s, e) for s, e in zip(df["d1"],df["d2"])]
    print(df)
              d1         d2                                            out
    0 2000-01-05 2006-08-31   relativedelta(years=-6, months=-7, days=-26)
    1 2000-01-06 2009-07-31   relativedelta(years=-9, months=-6, days=-25)
    2 2000-01-07 2012-06-30  relativedelta(years=-12, months=-5, days=-23)
    

    If use apply it should be slowier:

    df['out'] = df.apply(lambda x:  date_diffs(x["d1"],x["d2"]) , axis=1)