Search code examples
mysqlsqlgtfs

How can I get the next 2 stop times for each bus that passes a stop (GTFS data)?


My goal is simple. In my app, the user would press on a stop and it would open a screen displaying each route (bus) that is going by that stop and display the next 2 stop times for each route at that stop. Example of what I want to display when user presses stop with stop id of 1571:

Buses traveling by stop 569

Route 2
23 minutes (7:23 AM)
43 minutes (7:43 AM)

Route 37
15 minutes (7:15 AM)
45 minutes (7:45 AM)

Example data returned (what I want):
st.departure_time, rte.route_long_name, tr.trip_headsign
07:23:00, Route 2, Downtown
07:43:00, Route 2, Downtown
07:15:00, Route 37, Downtown
07:45:00, Route 37, Downtown

It should be noted that the stop code is 569 and the stop id is 1571. As per GTFS documentation, a user would usually search the stop code (569) but internally the stop id (1571) is what is searched. These come from the stops.txt file in a GTFS package.

The example is displaying 2 routes (2 and 37) but there could be more routes, there could be only 1 route, or there could be no routes at all passing this stop in the near future (I want to ignore times that are more than 24 hours away).

Disclaimer: I am not so good with SQL. I was able to create a query that appears to retrieve relevant data, however, it returns way more data than just the next 2 stop times for each route. I am parsing it in my app to get the desired results but I feel like it would be a lot easier to just improve my query.

SELECT stop_times.departure_time, routes.route_long_name, trips.trip_headsign, calendar.monday, calendar.tuesday, calendar.wednesday, calendar.thursday, calendar.friday, calendar.saturday, calendar.sunday
FROM stop_times, trips, routes, calendar
WHERE stop_times.stop_id = '1571' AND stop_times.trip_id = trips.trip_id AND trips.route_id = routes.route_id AND trips.service_id = calendar.service_id AND 20180801 >= calendar.start_date AND 20180801 <= calendar.end_date
ORDER BY routes.route_short_name, trips.service_id, stop_times.departure_time

Here are the relevant tables. I store my data in database exactly as specified in the GTFS documentation.

stops.txt example rows:
enter image description here

routes.txt example rows:
enter image description here

trips.txt example rows:
enter image description here

stop_times.txt example rows:
enter image description here

calendar.txt example rows:
enter image description here

Thank you very much for any help.


Solution

  • One of the things you could already do is use the standard JOIN notation.

    It's more readable, and less prone to causing an accidental cartesian join.

    And using short alias names can make a SQL more concise .

    SET @StopId = '1571';
    SET @CalDate = cast('2018-08-01' as date);
    
    SELECT 
     st.departure_time, 
     rte.route_long_name, 
     tr.trip_headsign,
     cal.monday, cal.tuesday, cal.wednesday, cal.thursday, cal.friday, 
     cal.saturday, cal.sunday
    FROM stop_times AS st
    JOIN trips AS tr ON tr.trip_id = st.trip_id
    JOIN routes AS rte ON rte.route_id = tr.route_id
    JOIN calendar AS cal ON cal.service_id = tr.service_id
    WHERE st.stop_id = @StopId
    AND @CalDate between cal.start_date and cal.end_date
    ORDER BY rte.route_short_name, tr.service_id, st.departure_time
    

    As for that requirement to get the next 2 stops per route?

    Here's an example SQL for that.
    It includes a check on the weekday. Not sure if that's needed, so that's just to demonstrate.

    A SQL Fiddle for it can be found here

    SET @StopId = 1571;
    SET @CalDate = cast('2018-08-01' as date);
    SET @StartTime = cast('08:00:00' as time);
    
    SELECT departure_time, route_long_name, trip_headsign
    FROM 
    (
        SELECT 
         st.departure_time,
         tr.service_id,
         rte.route_short_name,
         rte.route_long_name,
         tr.trip_headsign,
         @num := if(@prev_routeid = tr.route_id, @num + 1, 1) as RN,
         @prev_routeid := tr.route_id as route_id
        FROM stop_times AS st
        JOIN trips AS tr ON tr.trip_id = st.trip_id
        JOIN routes AS rte ON rte.route_id = tr.route_id
        JOIN calendar AS cal ON cal.service_id = tr.service_id
        -- LEFT JOIN stops AS s on s.stop_id = st.stop_id
        CROSS JOIN (SELECT @num := 0, @type := '') AS vars
        WHERE st.stop_id = @StopId
        AND @CalDate between cal.start_date and cal.end_date
        AND (CASE WEEKDAY(@CalDate)
            WHEN 0 THEN monday 
            WHEN 1 THEN tuesday 
            WHEN 2 THEN wednesday
            WHEN 3 THEN thursday
            WHEN 4 THEN friday
            WHEN 5 THEN saturday
            WHEN 6 THEN sunday
            END) = 1
        AND st.departure_time >= @StartTime
        ORDER BY tr.route_id, st.departure_time
    ) Q
    WHERE RN <= 2
    ORDER BY route_short_name, service_id, departure_time;
    

    Side-note:
    The RN is calculated in an old MySql 7 style, using variables.
    In MySql 8 the window function ROW_NUMBER can be used instead.