I am trying to calculate future dates by adding a column with number of days df['num_days'] to another column df["sampling_date"] but getting Overflow in int64 addition. Source code-
df['sampling_date']=pd.to_datetime(df['sampling_date'], errors='coerce')
df['future_date'] = df['sampling_date'] + pd.to_timedelta(df['num_days'], unit='D')
df['future_date'] = pd.to_datetime(df['future_date']).dt.strftime('%Y-%m-%d')
df['future_date'] = df['future_date'].astype(np.str)
df['future_date'] = np.where(df['num_days']<=0,0, df['future_date'])
for column df['num_days'], the values are as follows [0, 866, 729, 48357555, 567, 478]
I am trying to run this in unix server. Please help me resolving it.
The issue is this value: 48357555
You can create a simple function as shown below to return NaT
if error is thrown:
import numpy as np
import pandas as pd
# Here is an example df
df = pd.DataFrame({
'sampling_date': ['2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01', '2022-05-01', '2022-06-01'],
'num_days': [0, 866, 729, 48357555, 567, 478]
})
df['sampling_date'] = pd.to_datetime(df['sampling_date'], errors='coerce')
def calculate_future_date(row):
try:
return row['sampling_date'] + pd.to_timedelta(row['num_days'], unit='D')
except:
return pd.NaT
# Apply the function to each row
df['future_date'] = df.apply(calculate_future_date, axis=1)
df['future_date'] = np.where(df['num_days'] <= 0, df['sampling_date'], df['future_date'])
df['future_date'] = df['future_date'].dt.strftime('%Y-%m-%d').replace(pd.NaT, '0').astype(str)
print(df)
sampling_date num_days future_date
0 2022-01-01 0 2022-01-01
1 2022-02-01 866 2024-06-16
2 2022-03-01 729 2024-02-28
3 2022-04-01 48357555 0
4 2022-05-01 567 2023-11-19
5 2022-06-01 478 2023-09-22