Search code examples
pythonpandasstring-to-datetime

Pandas to_datetime ignore the format


I was trying to convert a date stored in my dataframe to DateTime format. The column i'm trying to convert has dates stored in mm/dd/yy format.

This is the script i used to convert:

df['dt'] = pd.to_datetime(df['dt'], format = '%d-%m-%Y')

The script runs without an error converting the dates accurately even-though the format provided is not correct.

My question is why the script didn't throw an error when wrong format is provided?


Solution

  • Consider the date 1-2-2020. Now just by looking at the date can you say exactly what date it is? The answer is no, because, unless you know how the date is formatted or how the date was created i.e whether Day-Month-Year or Month-Day-Year, you can't really say whether the above date is 1st February 2020 or 2nd January 2020. So, the key here is verifying the dataset and it's origins. There are multiple intuition techniques that you can apply to your data, like, if the data is originated from the United States, the common date format is MM/DD/YYYY or if India it is DD-MM-YY.

    SAMPLE

    >>> import pandas as pd
    >>> df = pd.DataFrame({'dt': ['1-1-2020', '15-2-2020', '3-24-2020']})
    >>> df
              dt
    0   1-1-2020
    1  15-2-2020
    2  3-24-2020
    

    CODE - Throws error as expected

    >>> pd.to_datetime(df['dt'], format = '%d-%m-%Y')
    Traceback (most recent call last):
      File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 448, in _convert_listlike_datetimes
        values, tz = conversion.datetime_to_datetime64(arg)
      File "pandas/_libs/tslibs/conversion.pyx", line 200, in pandas._libs.tslibs.conversion.datetime_to_datetime64
    TypeError: Unrecognized value type: <class 'str'>
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/util/_decorators.py", line 208, in wrapper
        return func(*args, **kwargs)
      File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 778, in to_datetime
        values = convert_listlike(arg._values, True, format)
      File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 451, in _convert_listlike_datetimes
        raise e
      File "/home/vishnudev/anaconda3/envs/sumyag/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 416, in _convert_listlike_datetimes
        arg, format, exact=exact, errors=errors
      File "pandas/_libs/tslibs/strptime.pyx", line 142, in pandas._libs.tslibs.strptime.array_strptime
    ValueError: time data '3-24-2020' does not match format '%d-%m-%Y' (match)