Search code examples
pythonpandasstring-to-datetime

Pandas, Handling "Out of bounds timestamp..."


I have a df with certain features as object types which I want to convert to datetypes. When I attempt to convert using pd.to_datetime, some of these features return an "Out of bounds timestamp" error message. To address this, I add "errors= coerce" argument, then seek to drop all NAs which result. For example:

pd.to_datetime(df[date_features], infer_datetime_format = True, errors = 'coerce')
df[date_features].dropna(inplace= True)

Yet, this doesn't seem to convert the features to 'datetime:' ("maturity_date" is one of the date_features I am trying to convert to datetime).

df.[maturity_date].describe()

count        3355323
unique         11954
top       2015-12-01
freq           29607
Name: maturity_date, dtype: object

Furthermore, if I again try to convert maturity_date using pd.to_datetime without "coerce" I get the "Out of bounds" timestamp.

I hope I have described this problem thoroughly.

Any thoughts?


Solution

  • pd.to_datetime is not an inplace operation. Your code performs a conversion, and proceeds to discard the result. The right thing to do would be to assign the result back, like so -

    df['date_features'] = pd.to_datetime(df.date_features, errors='coerce')
    

    Furthermore, don't call dropna on a column that belongs to a dataframe, as this will not modify the dataframe (even with inplace=True). Instead, call dropna on the dataframe with a subset attribute -

    df.dropna(subset='date_features', inplace=True)
    

    Now, as observed, maturity_date will look like this -

    results["maturity_date"].head()
    
    0   2017-04-01
    1   2017-04-01
    2   2017-04-01
    3   2016-01-15
    4   2016-01-15
    Name: maturity_date, dtype: datetime64[ns]
    

    As you can see, the dtype is datetime64, meaning this operation worked. If you call describe(), it performs a few standard aggregations and returns the results as a new series. This series is displayed in the same way as any other, including a dtype description that applies to it, not the column it is describing.