I am cleaning some data working on Pandas
and have one column year containing two formats of years 19 and 2019.
I am currently using pd.to_datetime(chelsea["Year"], format="%y")
it is giving me errors. Now how can I convert this column into a single format like 2019
ERROR is ValueError: unconverted data remains: 93
Have a look here: Handling multiple datetime formats with pd.to_datetime
In short, you can use errors=coerce
to fill any years not matching a pattern with NaT. If you repeat that for all patterns you have in the data, you can then combine the different results together:
import pandas as pd
ser = pd.Series(["2019", "73", "2020", "01", "13", "1998", "99"])
one = pd.to_datetime(ser, format="%Y", errors="coerce")
two = pd.to_datetime(ser, format="%y", errors="coerce")
out = one.fillna(two)
out:
0 2019-01-01
1 1973-01-01
2 2020-01-01
3 2001-01-01
4 2013-01-01
5 1998-01-01
6 1999-01-01
dtype: datetime64[ns]
you can convert those back to just the year:
years = out.dt.strftime("%Y")