Search code examples
pythonpandasdatepython-re

Is there a way to covert date (with different format) into a standardized format in python?


I have a column calls "date" which is an object and it has very different date format like dd.m.yy, dd.mm.yyyy, dd/mm/yyyy, dd/mm, m/d/yyyy etc as below. Obviously by simply using df['date'] = pd.to_datetime(df['date']) will not work. I wonder for messy date value like that, is there anyway to standardized and covert the date into one single format ?

date
17.2.22 # means Feb 17 2022
23.02.22 # means Feb 23 2022
17/02/2022 # means Feb 17 2022
18.2.22 # means Feb 18 2022
2/22/2022 # means Feb 22 2022
3/1/2022 # means March 1 2022
<more messy different format>

Solution

  • Coerce the dates to datetime and allow invalid entries to be turned into nulls.Also, allow pandas to infer the format. code below

    df['date'] = pd.to_datetime(df['date'], errors='coerce',infer_datetime_format=True)
    
    
        date
    0 2022-02-17
    1 2022-02-23
    2 2022-02-17
    3 2022-02-18
    4 2022-02-22
    5 2022-03-01