python python-3.x pandas dataframe python-datetime

Adding months to a date which is bigger than the limit of Timestamp type

I have a df where one of the column is a date with a datetime64[ns] type.

Over this column I want to add months using another column of the dataframe as a base:

df['date_shifted']=df['date'].values.astype('datetime64[M]')+(df['months']).values.astype('timedelta64[M]')

My problem comes when I exceed the maximum of the datetime64 type and I retrieve the following error:

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2846-04-30 00:00:00

Are there any way to work around this error and add the months that I need to my dataframe?

Some example of possible data which would be an error after the calculation:

date	months
28-01-2017	9999
13-05-2018	9999
22-03-2016	9999
05-12-2007	9999

Note: I know that I can coerce the errors to NaT but I need the dates for following calculations.

Solution

You can use Periods, as per the Representing out-of-bounds spans section of the guide on timestamps posted by @HenryEcker in comments. To convert the column simply use .dt.to_period():

>>> df['date'].dt.to_period(freq='M')
0    2017-01
1    2018-05
2    2016-03
3    2007-05
Name: date, dtype: period[M]

The rest is easy, adding the int64 months can even be done without conversion:

>>> df['shifted_date'] = df['date'].dt.to_period(freq='M') + df['months']
>>> df
        date  months shifted_date
0 2017-01-28    9999      2850-04
1 2018-05-13    9999      2851-08
2 2016-03-22    9999      2849-06
3 2007-05-12    9999      2840-08
>>> df['shifted_date']
0    2850-04
1    2851-08
2    2849-06
3    2840-08
Name: shifted_date, dtype: period[M]

Based on the dates you have you could use a smaller granularity period:

>>> df['shifted_date'].astype('Period[D]')
0    2850-04-30
1    2851-08-31
2    2849-06-30
3    2840-08-31
Name: shifted_date, dtype: period[D]

Going back to datetimes would trigger the overflow you’re trying to avoid:

>>> df['shifted_date'].dt.start_time
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.8/site-packages/pandas/core/accessor.py", line 78, in _getter
    return self._delegate_property_get(name)
  File "/usr/lib64/python3.8/site-packages/pandas/core/indexes/accessors.py", line 70, in _delegate_property_get
    result = getattr(values, name)
  File "/usr/lib64/python3.8/site-packages/pandas/core/arrays/period.py", line 420, in start_time
    return self.to_timestamp(how="start")
  File "/usr/lib64/python3.8/site-packages/pandas/core/arrays/period.py", line 465, in to_timestamp
    new_data = libperiod.periodarr_to_dt64arr(new_data.asi8, base)
  File "pandas/_libs/tslibs/period.pyx", line 977, in pandas._libs.tslibs.period.periodarr_to_dt64arr
  File "pandas/_libs/tslibs/conversion.pyx", line 246, in pandas._libs.tslibs.conversion.ensure_datetime64ns
  File "pandas/_libs/tslibs/np_datetime.pyx", line 113, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2850-04-01 00:00:00