Search code examples
pythonpandasdata-sciencedata-cleaning

Data Have to year format 19 and 2019 how to convert it into a single format 2019


I am cleaning some data working on Pandas and have one column year containing two formats of years 19 and 2019.

I am currently using pd.to_datetime(chelsea["Year"], format="%y")

it is giving me errors. Now how can I convert this column into a single format like 2019

ERROR is ValueError: unconverted data remains: 93


Solution

  • Have a look here: Handling multiple datetime formats with pd.to_datetime

    In short, you can use errors=coerce to fill any years not matching a pattern with NaT. If you repeat that for all patterns you have in the data, you can then combine the different results together:

    import pandas as pd
    
    ser = pd.Series(["2019", "73", "2020", "01", "13", "1998", "99"])
    
    one = pd.to_datetime(ser, format="%Y", errors="coerce")
    two = pd.to_datetime(ser, format="%y", errors="coerce")
    
    out = one.fillna(two)
    

    out:

    0   2019-01-01
    1   1973-01-01
    2   2020-01-01
    3   2001-01-01
    4   2013-01-01
    5   1998-01-01
    6   1999-01-01
    dtype: datetime64[ns]
    

    you can convert those back to just the year:

    years = out.dt.strftime("%Y")