I would like to understand if I can convert a column with mixed types (DateTime and Strings) to a PeriodIndex (for instance month).
I have the following DataFrame:
booking_date ... credit debit
None ... 10185.00 -10185.00
2017-01-01 00:00:00 ... 1796.00 0.00
2018-07-01 00:00:00 ... 7423.20 -11.54
2017-04-01 00:00:00 ... 1704.00 0.00
2017-12-01 00:00:00 ... 1938.60 -1938.60
2018-12-01 00:00:00 ... 1403.47 -102.01
2018-01-01 00:00:00 ... 2028.00 -76.38
2019-01-01 00:00:00 ... 800.00 -256.98
Total ... 10185.00 -10185.00
I'm trying to apply the PeriodIndex to booking_date:
df['booking_date'] = pd.PeriodIndex(df['booking_date'].values, freq='M')
However, I receive the following error:
pandas._libs.tslibs.parsing.DateParseError: Unknown datetime string format, unable to parse: TOTAL
Anyway I can get around with this?
Thanks!
If need Periods
only cannot mixing with strings:
df['booking_date'] = pd.to_datetime(df['booking_date'], errors='coerce').dt.to_period('m')
print (df)
booking_date ... credit debit
0 NaT ... 10185.00 -10185.00
1 2017-01 ... 1796.00 0.00
2 2018-07 ... 7423.20 -11.54
3 2017-04 ... 1704.00 0.00
4 2017-12 ... 1938.60 -1938.60
5 2018-12 ... 1403.47 -102.01
6 2018-01 ... 2028.00 -76.38
7 2019-01 ... 800.00 -256.98
8 NaT ... 10185.00 -10185.00
But it is possible:
orig = df['booking_date']
df['booking_date'] = pd.to_datetime(df['booking_date'], errors='coerce').dt.to_period('m')
df.loc[df['booking_date'].isna(), 'booking_date'] = orig
print (df)
booking_date ... credit debit
0 None ... 10185.00 -10185.00
1 2017-01 ... 1796.00 0.00
2 2018-07 ... 7423.20 -11.54
3 2017-04 ... 1704.00 0.00
4 2017-12 ... 1938.60 -1938.60
5 2018-12 ... 1403.47 -102.01
6 2018-01 ... 2028.00 -76.38
7 2019-01 ... 800.00 -256.98
8 Total ... 10185.00 -10185.00
print (df['booking_date'].apply(type))
0 <class 'NoneType'>
1 <class 'pandas._libs.tslibs.period.Period'>
2 <class 'pandas._libs.tslibs.period.Period'>
3 <class 'pandas._libs.tslibs.period.Period'>
4 <class 'pandas._libs.tslibs.period.Period'>
5 <class 'pandas._libs.tslibs.period.Period'>
6 <class 'pandas._libs.tslibs.period.Period'>
7 <class 'pandas._libs.tslibs.period.Period'>
8 <class 'str'>
Name: booking_date, dtype: object
new = pd.to_datetime(df['booking_date'], errors='coerce').dt.to_period('m')
df['booking_date'] = np.where(new.isna(), df['booking_date'], new)
print (df)
booking_date ... credit debit
0 None ... 10185.00 -10185.00
1 2017-01 ... 1796.00 0.00
2 2018-07 ... 7423.20 -11.54
3 2017-04 ... 1704.00 0.00
4 2017-12 ... 1938.60 -1938.60
5 2018-12 ... 1403.47 -102.01
6 2018-01 ... 2028.00 -76.38
7 2019-01 ... 800.00 -256.98
8 Total ... 10185.00 -10185.00
print (df['booking_date'].apply(type))
0 <class 'NoneType'>
1 <class 'pandas._libs.tslibs.period.Period'>
2 <class 'pandas._libs.tslibs.period.Period'>
3 <class 'pandas._libs.tslibs.period.Period'>
4 <class 'pandas._libs.tslibs.period.Period'>
5 <class 'pandas._libs.tslibs.period.Period'>
6 <class 'pandas._libs.tslibs.period.Period'>
7 <class 'pandas._libs.tslibs.period.Period'>
8 <class 'str'>
Name: booking_date, dtype: object