Search code examples
pandasdatetimedatediffsplinepatsy

Pandas Time (not Date) differences and not as object


I would like to convert my timestamp columns into a regressor for the ML model. I need it to be a spline - one continuous variable (one column in the design matrix X), as opposed to a one-hot codded discretized intervals.

In my original data, I have timestamp only as a string at my disposal, and using a few pandas tricks I am able to get what I need.

However, I have a problem when creating the time differences, where I would like to calculate the time distance from the minimal column's value to the value of the current row.

I am proving simple example down below:

import pandas as pd

# Create dataset
df = pd.DataFrame({"Time": ["071323", "081326", "101405", "111545", "124822", "131112"]})

df

df = df.assign(timestamp=lambda d: pd.to_datetime(d["Time"], format="%H%M%S").dt.time)

df.assign(time_trend_from_min = lambda d: d['timestamp'] - d['timestamp'].min())

# The values should be 1 hour, 0 minutes and 3 seconds etc.

The second problem I have is that when I only convert it using pd.to_datetime(d["Time"], format="%H%M%S").dt.time is is stored as a object type and thus e.g. patsy would dummy codded every single unique values. I need to create 1D time series of differences, where on the X-axis we have index and y-axis we have the time distance from minimal values.

I am sure it is literally 2 lines of code needed however I am not able to crack it on my own


Solution

  • You can try:

    s=pd.to_datetime(df['Time'],format="%H%M%S")
    df['time_trend_from_min']=s-s.min()
    

    output of df:

        Time    timestamp   time_trend_from_min
    0   071323  07:13:23    0 days 00:00:00
    1   081326  08:13:26    0 days 01:00:03
    2   101405  10:14:05    0 days 03:00:42
    3   111545  11:15:45    0 days 04:02:22
    4   124822  12:48:22    0 days 05:34:59
    5   131112  13:11:12    0 days 05:57:49