According to GTFS specification, arrival_time
filed in GTFS stop_times.txt file can have greater amount of hours than 24.
For times occurring after midnight on the service day, enter the time as a value greater than 24:00:00 in HH:MM:SS local time for the day on which the trip schedule begins.
Those values are meant to represent future stops that vehicles will make at each specific time.
I have a pandas dataframe consisting of a bunch of columns where one is storing arrival_time
values. What I'm trying to accomplish is to convert those values that are greater than 24 to actual 24-hour timestamps, e.g. converting 25:34:21
to 01:34:21
.
First, I tried to use str.replace
with a regex, but it got messy pretty fast.
df['arrival_time'].str.replace(r'\s(24)', '00', regex=True)
df['arrival_time'].str.replace(r'\s(25)', '01', regex=True)
df['arrival_time'].str.replace(r'\s(26)', '02', regex=True)
...
df['arrival_time'].str.replace(r'\s(31 )', '07', regex=True)
...
Later, I skimmed through timedelta objects to try solve the issue. What went wrong here is that each of values in arrival_time
may differ depending on when those are pulled from the server. In the morning values might go up to e.g. 27, but in the evening update, values might go over 36. This makes specifying the day span little bit harder.
I'm not quite sure where I should look to solve my issue.
Original column:
%Y-%m-%d 13:44:01
%Y-%m-%d 13:56:23
%Y-%m-%d 17:59:02
%Y-%m-%d 24:21:45
%Y-%m-%d 26:15:14
Desired state:
%Y-%m-%d 13:44:01
%Y-%m-%d 13:56:23
%Y-%m-%d 17:59:02
%Y-%m-%d+1 00:21:45
%Y-%m-%d+1 02:15:14
This may be what you are looking for
provided_times = ["24:00:00", "12:10:32", "36:35:34"]
corrected_times = []
for time in provided_times:
num_hour = eval(time[:2])
count_days = '+1d ' if int(num_hour / 24) else ' '
corrected_times.append(count_days + ('0' + str(num_hour % 24))[-2:] + time[2:])
print(corrected_times)
The result is
['+1d 00:00:00', ' 12:10:32', '+1d 12:35:34']