The column I work with looks like the following:
id | date |
---|---|
1 | 20220104 |
2 | 0 |
3 | 20220506 |
4 | 20220206 |
The date column is an "object" type, holding string values. Note that this column can hold "0" as value.
My code that is supposed to convert the string values to datetime:
def transform_column(data,spec)
"""
data: the dataframe
spec: date format I would like it to look like in the end
"""
data = chunk[spec[1]].apply(str)
data = pd.to_datetime(data, format=spec[2])
return data.dt.strftime(spec[3])
The configuration looks something like this (so you have an idea of the format expected in the spec parameter):
"COLUMN_NAME" : ["date", "COLUMN_NAME", "%Y%m%d", "%m/%d/%Y"]
I would like to ask for ideas how to solve the "0" strings in this data, so the conversion works on the correctly formatted strings, and puts NAN or Null or something where it is "0". What is the value that pd.to_datetime could work with better when it should be able to convert the entire column? Is it actually possible?
Thank you in advance!
Use parameter errors='coerce'
in to_datetime
, converting to strings is not necessary:
def transform_column(data,spec):
"""
data: the dataframe
spec: date format I would like it to look like in the end
"""
data[spec[0]] = (pd.to_datetime(data[spec[0]], format=spec[2], errors='coerce')
.dt.strftime(spec[3]))
return data
d = {"col" : ["date", "COLUMN_NAME", "%Y%m%d", "%m/%d/%Y"]}
df = transform_column(df,d['col'])
print (df)
id date
0 1 01/04/2022
1 2 NaN
2 3 05/06/2022
3 4 02/06/2022