Search code examples
pythonpython-datetime

Python3: converting timespan to actual time


According to GTFS specification, arrival_time filed in GTFS stop_times.txt file can have greater amount of hours than 24.

For times occurring after midnight on the service day, enter the time as a value greater than 24:00:00 in HH:MM:SS local time for the day on which the trip schedule begins.

Those values are meant to represent future stops that vehicles will make at each specific time.

I have a pandas dataframe consisting of a bunch of columns where one is storing arrival_time values. What I'm trying to accomplish is to convert those values that are greater than 24 to actual 24-hour timestamps, e.g. converting 25:34:21 to 01:34:21.

First, I tried to use str.replace with a regex, but it got messy pretty fast.

df['arrival_time'].str.replace(r'\s(24)', '00', regex=True)
df['arrival_time'].str.replace(r'\s(25)', '01', regex=True)
df['arrival_time'].str.replace(r'\s(26)', '02', regex=True)
...
df['arrival_time'].str.replace(r'\s(31 )', '07', regex=True)
...

Later, I skimmed through timedelta objects to try solve the issue. What went wrong here is that each of values in arrival_time may differ depending on when those are pulled from the server. In the morning values might go up to e.g. 27, but in the evening update, values might go over 36. This makes specifying the day span little bit harder.

I'm not quite sure where I should look to solve my issue.

Original column:

%Y-%m-%d 13:44:01
%Y-%m-%d 13:56:23
%Y-%m-%d 17:59:02
%Y-%m-%d 24:21:45
%Y-%m-%d 26:15:14

Desired state:

%Y-%m-%d 13:44:01
%Y-%m-%d 13:56:23
%Y-%m-%d 17:59:02
%Y-%m-%d+1 00:21:45
%Y-%m-%d+1 02:15:14

Solution

  • This may be what you are looking for

    provided_times = ["24:00:00", "12:10:32", "36:35:34"]
    corrected_times = []
    
    for time in provided_times:
        num_hour = eval(time[:2])
        count_days = '+1d ' if int(num_hour / 24) else '    '
        corrected_times.append(count_days + ('0' + str(num_hour % 24))[-2:] + time[2:])
    
    print(corrected_times)
    

    The result is

    ['+1d 00:00:00', '    12:10:32', '+1d 12:35:34']