Search code examples
sqlgtfs

How can I include in schedules today's departures after midnight using GTFS?


I began with GTFS and offhand ran into big problem with my SQL query:

SELECT *, ( some columns AS shortcuts )
FROM stop_times 
LEFT JOIN trips ON stop_times.trip_id = trips.trip_id
WHERE trips.max_sequence != stop_times.stop_sequence
AND stop_id IN( $incodes )
AND trips.service_id IN ( $service_ids )
AND ( departure_time >= $time )
AND ( trips.end_time >= $time )
AND ( trips.start_time <= $time_plus_3hrs )
GROUP BY t,l,sm
ORDER BY t ASC, l DESC
LIMIT 14

This should show departures from some stop in next 3 hours. It works but with approaching midnight (e.g. 23:50) it catch only "today's departure". After midnight it catch only "new day departures" and departures from previous day are missing, because they have departure_time e.g. "24:05" (=not bigger than $time 00:05). Is possible to use something lighter than UNION same query for next day? If UNION is using, how can I ORDER departures for trimming by LIMIT?

Trips.start_time and end_time are my auxiliary variables for accelerate SQL query execution, it means sequence1-arrival_time and MAXsequence-departure_time of any trip.


Solution

  • Using UNION to link together a query for each day is going to be your best bet, unless perhaps you want to issue two completely separate queries and then merge the results together in your application. The contortionism required to do all this with a single SELECT statement (assuming it's even possible) would not be worth the effort.

    Part of the complexity here is that the set of active service IDs can vary between consecutive days, so a distinct set must be used for each one. (For a suggestion of how to build this set in SQL using a subquery and table join, see my answer to "How do I use calendar exceptions to generate accurate schedules using GTFS?".)

    More complexity arises from the fact the results for each day must be treated differently: For the result set to be ordered correctly, we need to subtract twenty-four hours from all of (and only) yesterday's times.

    Try a query like this, following the "pseudo-SQL" in your question and assuming you are using MySQL/MariaDB:

    SELECT *, SUBTIME(departure_time, '24:00:00') AS t, ...
      FROM stop_times
      LEFT JOIN trips ON stop_times.trip_id = trips.trip_id
      WHERE trips.max_sequence != stop_times.stop_sequence
        AND stop_id IN ( $incodes )
        AND trips.service_id IN ( $yesterdays_service_ids )
        AND ( departure_time >= ADDTIME($time, '24:00:00') )
        AND ( trips.end_time >= ADDTIME($time, '24:00:00') )
        AND ( trips.start_time <= ADDTIME($time_plus_3hrs, '24:00:00') )
      UNION
        SELECT *, departure_time AS t, ...
          FROM stop_times 
          LEFT JOIN trips ON stop_times.trip_id = trips.trip_id
          WHERE trips.max_sequence != stop_times.stop_sequence
            AND stop_id IN ( $incodes )
            AND trips.service_id IN ( $todays_service_ids )
            AND ( departure_time >= $time )
            AND ( trips.end_time >= $time )
            AND ( trips.start_time <= $time_plus_3hrs )
      GROUP BY t, l, sm
      ORDER BY t ASC, l DESC
      LIMIT 14