I have a df with column "time_gap" which has a categoricalDtype
:
CategoricalDtype(categories=['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
'0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
'0 days 06:00:00', '0 days 07:00:00', '0 days 08:00:00',
'0 days 09:00:00', '0 days 10:00:00', '0 days 11:00:00',
'0 days 12:00:00', '0 days 13:00:00', '0 days 14:00:00',
'0 days 15:00:00', '0 days 16:00:00', '0 days 17:00:00',
'0 days 18:00:00', '0 days 19:00:00', '0 days 20:00:00',
'0 days 21:00:00', '0 days 22:00:00', '0 days 23:00:00'],
, ordered=True)
--> hours:minutes:seconds
I would like to convert it to a datetime dtype (and ideally get rid of the "0 days").
When I try using df["time_gap"] = pd.to_datetime(df["time_gap"])
, I get the following error:
TypeError: <class 'pandas._libs.tslibs.timedeltas.Timedelta'> is not convertible to datetime, at position 0
Is there an easy way to convert this categoricalDtype to datetime?
Thank you in advance for your feedbacks.
here're two options how you could access the 'hours' from the categorial column:
import pandas as pd
# dummy data -->
df = pd.DataFrame({"time_gap":['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
'0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
'0 days 06:00:00', '0 days 07:00:00', '0 days 08:00:00',
'0 days 09:00:00', '0 days 10:00:00', '0 days 11:00:00',
'0 days 12:00:00', '0 days 13:00:00', '0 days 14:00:00',
'0 days 15:00:00', '0 days 16:00:00', '0 days 17:00:00',
'0 days 18:00:00', '0 days 19:00:00', '0 days 20:00:00',
'0 days 21:00:00', '0 days 22:00:00', '0 days 23:00:00']
})
df["time_gap"] = pd.to_timedelta(df["time_gap"]).astype("category")
# <-- dummy data
# via timedelta:
df["hour"] = df["time_gap"].astype("timedelta64[ns]").dt.total_seconds()/3600
# via datetime:
df["hour_"] = (pd.Timestamp("2022-01-01") + df["time_gap"].astype("timedelta64[ns]")).dt.hour
print(df)
time_gap hour hour_
0 0 days 00:00:00 0.0 0
1 0 days 01:00:00 1.0 1
2 0 days 02:00:00 2.0 2
3 0 days 03:00:00 3.0 3
4 0 days 04:00:00 4.0 4
5 0 days 05:00:00 5.0 5
...
# note that .dt.hour gives you an integer:
print(df.dtypes)
time_gap category
hour float64
hour_ int32
dtype: object