I was trying to convert a date stored in my dataframe to DateTime format. The column i'm trying to convert has dates stored in mm/dd/yy format.
This is the script i used to convert:
df['dt'] = pd.to_datetime(df['dt'], format = '%d-%m-%Y')
The script runs without an error converting the dates accurately even-though the format provided is not correct.
My question is why the script didn't throw an error when wrong format is provided?
Consider the date 1-2-2020
. Now just by looking at the date can you say exactly what date it is? The answer is no, because, unless you know how the date is formatted or how the date was created i.e whether Day-Month-Year or Month-Day-Year, you can't really say whether the above date is 1st February 2020
or 2nd January 2020
. So, the key here is verifying the dataset and it's origins. There are multiple intuition techniques that you can apply to your data, like, if the data is originated from the United States, the common date format is MM/DD/YYYY
or if India it is DD-MM-YY
.
SAMPLE
>>> import pandas as pd
>>> df = pd.DataFrame({'dt': ['1-1-2020', '15-2-2020', '3-24-2020']})
>>> df
dt
0 1-1-2020
1 15-2-2020
2 3-24-2020
CODE - Throws error as expected
>>> pd.to_datetime(df['dt'], format = '%d-%m-%Y')
Traceback (most recent call last):
File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 448, in _convert_listlike_datetimes
values, tz = conversion.datetime_to_datetime64(arg)
File "pandas/_libs/tslibs/conversion.pyx", line 200, in pandas._libs.tslibs.conversion.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/util/_decorators.py", line 208, in wrapper
return func(*args, **kwargs)
File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 778, in to_datetime
values = convert_listlike(arg._values, True, format)
File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 451, in _convert_listlike_datetimes
raise e
File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 416, in _convert_listlike_datetimes
arg, format, exact=exact, errors=errors
File "pandas/_libs/tslibs/strptime.pyx", line 142, in pandas._libs.tslibs.strptime.array_strptime
ValueError: time data '3-24-2020' does not match format '%d-%m-%Y' (match)