I have a df with certain features as object types which I want to convert to datetypes. When I attempt to convert using pd.to_datetime, some of these features return an "Out of bounds timestamp" error message. To address this, I add "errors= coerce" argument, then seek to drop all NAs which result. For example:
pd.to_datetime(df[date_features], infer_datetime_format = True, errors = 'coerce')
df[date_features].dropna(inplace= True)
Yet, this doesn't seem to convert the features to 'datetime:' ("maturity_date" is one of the date_features I am trying to convert to datetime).
df.[maturity_date].describe()
count 3355323
unique 11954
top 2015-12-01
freq 29607
Name: maturity_date, dtype: object
Furthermore, if I again try to convert maturity_date using pd.to_datetime without "coerce" I get the "Out of bounds" timestamp.
I hope I have described this problem thoroughly.
Any thoughts?
pd.to_datetime
is not an inplace operation. Your code performs a conversion, and proceeds to discard the result. The right thing to do would be to assign the result back, like so -
df['date_features'] = pd.to_datetime(df.date_features, errors='coerce')
Furthermore, don't call dropna
on a column that belongs to a dataframe, as this will not modify the dataframe (even with inplace=True
). Instead, call dropna
on the dataframe with a subset
attribute -
df.dropna(subset='date_features', inplace=True)
Now, as observed, maturity_date
will look like this -
results["maturity_date"].head()
0 2017-04-01
1 2017-04-01
2 2017-04-01
3 2016-01-15
4 2016-01-15
Name: maturity_date, dtype: datetime64[ns]
As you can see, the dtype
is datetime64
, meaning this operation worked. If you call describe()
, it performs a few standard aggregations and returns the results as a new series. This series is displayed in the same way as any other, including a dtype
description that applies to it, not the column it is describing.