Search code examples
pythonpandasmachine-learningtime-seriesdata-analysis

How to use 24 hour time series data as a predictive feature


I am just wondering how best to approach using this 24 hour time format as a predictive feature. My thoughts were to bin it into 24 categories for each hour of the day. Is there an easy way to convert this object into a python datetime object that would make binning easier or how would you advise handling this feature? Thanks :)

df['Duration']

0         2:50
1         7:25
2        19:00
3         5:25
4         4:45
5         2:25

df['Duration'].dtype

dtype('O')


Solution

  • The best solution will depend on what you hope to get from your model. In many cases it makes sense to convert it to total number of seconds (or minutes or hours) since some epoch. To convert your data to seconds since 00:00, you can use:

    from datetime import datetime
    
    t_str = "2:50"
    
    t_delta = datetime.strptime(t_str, "%H:%M") - datetime(1900, 1, 1)
    seconds = t_delta.total_seconds()
    hours = seconds/60**2
    
    print(seconds)
    # 10200.0
    

    Using Python's datetime class will not support time values over 23:59. Since it appears that your data may actually be a duration, you may want to represent it as an instance of Python's timedelta class.

    from datetime import timedelta  
    
    h, m = map(int, t_str.split(sep=':'))
    t_delta = timedelta(hours=h, minutes=m)
    
    # Get total number of seconds
    seconds = t_delta.total_seconds()