I would like to convert my timestamp columns into a regressor for the ML model. I need it to be a spline - one continuous variable (one column in the design matrix X), as opposed to a one-hot codded discretized intervals.
In my original data, I have timestamp only as a string at my disposal, and using a few pandas tricks I am able to get what I need.
However, I have a problem when creating the time differences, where I would like to calculate the time distance from the minimal column's value to the value of the current row.
I am proving simple example down below:
import pandas as pd
# Create dataset
df = pd.DataFrame({"Time": ["071323", "081326", "101405", "111545", "124822", "131112"]})
df
df = df.assign(timestamp=lambda d: pd.to_datetime(d["Time"], format="%H%M%S").dt.time)
df.assign(time_trend_from_min = lambda d: d['timestamp'] - d['timestamp'].min())
# The values should be 1 hour, 0 minutes and 3 seconds etc.
The second problem I have is that when I only convert it using pd.to_datetime(d["Time"], format="%H%M%S").dt.time
is is stored as a object
type and thus e.g. patsy would dummy codded every single unique values. I need to create 1D time series of differences, where on the X-axis we have index and y-axis we have the time distance from minimal values.
I am sure it is literally 2 lines of code needed however I am not able to crack it on my own
You can try:
s=pd.to_datetime(df['Time'],format="%H%M%S")
df['time_trend_from_min']=s-s.min()
output of df
:
Time timestamp time_trend_from_min
0 071323 07:13:23 0 days 00:00:00
1 081326 08:13:26 0 days 01:00:03
2 101405 10:14:05 0 days 03:00:42
3 111545 11:15:45 0 days 04:02:22
4 124822 12:48:22 0 days 05:34:59
5 131112 13:11:12 0 days 05:57:49