Search code examples
pythonpandasdataframedatetimetimedelta

PerformanceWarning when adding months to datetime values in Pandas DataFrame using apply() and DateOffset


I have a Pandas DataFrame with a "date" column containing datetime values, and a "months" column containing integer values representing the number of months to add to each date. I want to calculate a new date for each row that is the original "date" plus the specified number of months in the months column.

I'm using the following code:

df['date'] + df["months"].apply(lambda y: pd.DateOffset(months=y))

This code seems to work correctly, but I'm getting a PerformanceWarning that says:

PerformanceWarning: Adding/subtracting object-dtype array to DatetimeArray not vectorized
  warnings.warn

I need to do this process with multiple date and months columns in my dataset and is taking a long time to perform this task.

Sample of input:

date months
0 2020-01-01 3
1 2021-06-15 6
2 2022-02-28 1

Sample of ouput:

date months new_date
0 2020-01-01 3 2020-04-01
1 2021-06-15 6 2021-12-15
2 2022-02-28 1 2022-03-28

What does this warning mean, and how can I improve my code to avoid it? I'm concerned that this may be affecting the performance of my code, and I want to make sure that it's as efficient as possible.


Solution

  • You can remove the performance warning by using .dt.date on the date column.

    For faster performance you can use dateutil.relativedelta:

    from dateutil.relativedelta import relativedelta
    
    df['new_date'] = df['date'].dt.date + df["months"].apply(lambda y: relativedelta(months=y))
    print(df)
    

    Prints (without performance warning):

            date  months    new_date
    0 2020-01-01       3  2020-04-01
    1 2021-06-15       6  2021-12-15
    2 2022-02-28       1  2022-03-28
    

    Quick benchmark:

    from timeit import timeit
    from dateutil.relativedelta import relativedelta
    
    
    def fn1(df):
        return df['date'].dt.date + df["months"].apply(lambda y: pd.DateOffset(months=y))
    
    def fn2(df):
        return df['date'].dt.date + df["months"].apply(lambda y: relativedelta(months=y))
    
    # create bigger dataframe
    df = pd.concat([df]*100).reset_index(drop=True)
    
    t1 = timeit('fn1(x)', setup='x = df.copy()', number=100, globals=globals())
    t2 = timeit('fn2(x)', setup='x = df.copy()', number=100, globals=globals())
    
    print(t1)
    print(t2)
    

    Prints:

    0.6759202829562128
    0.2161861159838736