Search code examples
sqlgtfs

How can I list a stop's directly connected neighbors using GTFS?


Lets say that these statemates are included in GTFS data:

1. stop A is connected with stop B and stop C
2. stop B is connected with stop D and stop E
3. stop C is connected with Stop F and stop G

Now I want to query all stops, that are directly connected with stop A. The result should be stop B and stop C.

Are those SQL queries possible with GTFS data?


Solution

  • Yes, although depending on your application the performance might not be everything you expect.

    Following your example, let's call stop A the origin stop and stops B and C the connected stops. Then to build the set of connected stops, we want to find all the stops that

    • Share a trip with the origin stop, and

    • On that shared trip are visited either immediately before or immediately after the origin stop.

    We can use the stop_sequence field of the stop_times table to test the second condition. Try this SQL:

    SELECT DISTINCT stops.id, stops.code, stops.name
      FROM stop_times AS origin_stop
      INNER JOIN stop_times AS connected_stops
        ON connected_stops.trip_id = origin_stop.trip_id
          AND ABS(connected_stops.stop_sequence - origin_stop.stop_sequence) = 1
      INNER JOIN stops
        ON stops.id = connected_stops.stop_id
      WHERE origin_stop.stop_id = <origin stop ID>;
    

    This joins the stops table for a prettier view; if all your application cares about are the connected stops' IDs you can remove that join and change the first line to simply "SELECT DISTINCT connected_stops.stop_id".

    To make performance acceptable you'll want to create indices on stop_times.stop_id and stop_times.trip_id:

    CREATE INDEX stop_times_stop_id_index ON stop_times(stop_id);
    
    CREATE INDEX stop_times_trip_id_index ON stop_times(trip_id);
    

    But if you're developing an online application, this is probably not the way to go. You'd be better off precalculating connections between stops using a graph representation of the transit network in memory and then querying that at runtime.