I've been trying to process a column date in the Dataframe to obtain the month in int type using pd.to_datetime
.
This is the code in Python using pandas .
print(df["date"].head())
0 Oct
1 Jun
2 15-Oct
3 27-Nov
4 26-Sep
Name: date, dtype: object
After attempting to convert to datetime,I obtained all values in NAT. How do I fix this?
df["date"]=pd.to_datetime(df["date"],errors='coerce')
print(df["date"].head())
I get:
0 NaT
1 NaT
2 NaT
3 NaT
4 NaT
Name: date, dtype: datetime64[ns]
Running isNA returns all values at NAT:
print(df["date"].isna().sum())
1000
I plan to obtain:
0 10
1 06
2 10
3 11
4 09
For values that can't be converted to datetime and then int(because values are missing or unrecognisable) I plan to replace with "Date not given"
What do I need to do?
Use Series.str.extract
with Series.map
:
d = {'Jan':'01', 'Feb':'02','Mar':'03', 'Apr':'04',
'May':'05','Jun':'06', 'Jul':'07','Aug':'08',
'Sep':'09', 'Oct':'10', 'Nov':'11', 'Dec':'12'}
df["date1"] = df["date"].str.extract(r'([A-Za-z]+)', expand=False).map(d)
Or convert values to datetimes with %b
for match months and convert to strings by Series.dt.strftime
:
df["date2"] = pd.to_datetime(df["date"].str.extract(r'([A-Za-z]+)', expand=False),
format='%b', errors='coerce').dt.strftime('%m')
print (df)
date date1 date2
0 Oct 10 10
1 Jun 06 06
2 15-Oct 10 10
3 27-Nov 11 11
4 26-Sep 09 09
If need integers:
print (df)
date
0 Ocyt
1 Jun
2 15-Oct
3 27-Nov
4 26-Sep
df["date2"] = (pd.to_datetime(df["date"].str.extract(r'([A-Za-z]+)', expand=False),
format='%b', errors='coerce')
.dt.month.astype('Int64'))
print (df)
date date2
0 Ocyt <NA>
1 Jun 6
2 15-Oct 10
3 27-Nov 11
4 26-Sep 9