Search code examples
mysqlsqlgtfs

How can I list all the trips from one stop to another using GTFS?


I am using the below table:

1   0051ML66220600132482    06:00:00        06:00:00        1538    100 0   1
2   0051ML66220600132482    06:00:00        06:00:00        1540    200 0   0
3   0051ML66220600132482    06:00:00        06:00:00        1541    300 0   0
4   0051ML66220600132482    06:01:00        06:01:00        1542    400 0   0
5   0051ML66220600132482    06:01:00        06:01:00        1543    500 0   0
6   0051ML66220600132482    06:02:00        06:02:00        1544    600 0   0
7   0051ML66220600132482    06:03:00        06:03:00        1546    700 0   0

Table structure us like below:

> ------------------------------------------------------------------
> --  TABLE stop_times
> ------------------------------------------------------------------
> 
> CREATE TABLE stop_times ( id int(12),
>                           trip_id varchar(100),
>                           arrival_time varchar(8),
>                           arrival_time_seconds int(100),
>                           departure_time varchar(8),
>                           departure_time_seconds int(100),
>                           stop_id varchar(100),
>                           stop_sequence varchar(100),
>                           pickup_type varchar(2),
>                           drop_off_type varchar(2) );

I am attempting to get DISTINCT trip_id is it matches both the destination and arrival stop_id

I have tried the below SQL with no luck:

select DISTINCT trip_id from stop_times where stop_id=1538 AND stop_id =1540;

Where is should produce: 0051ML66220600132482

I have also tried a INNER JOIN SQL like below:

SELECT 
       t.trip_id,
       start_s.stop_name as departure_stop,
       end_s.stop_name as arrival_stop
FROM
trips t 
        INNER JOIN stop_times start_st ON t.trip_id = start_st.trip_id
        INNER JOIN stops start_s ON start_st.stop_id = start_s.stop_id
        INNER JOIN stop_times end_st ON t.trip_id = end_st.trip_id
        INNER JOIN stops end_s ON end_st.stop_id = end_s.stop_id
WHERE 
   start_s.stop_id = 1538 
  AND end_s.stop_id = 1540;

But it is sooo slow, and takes about 8-15 seconds for this simple query.

Explain Added:

enter image description here

What is the fastest/Best way to make this query?


Solution

  • So in other words, you are looking for a query that will identify all the trips that pass through a pair of stops, the origin (starting point) and destination (ending point).

    Try this query:

    SELECT destination.trip_id
        FROM stop_times AS origin
        INNER JOIN stop_times AS destination
            ON destination.trip_id = origin.trip_id
            AND destination.stop_id = 1540
        WHERE origin.stop_id = 1538
            AND origin.stop_sequence < destination.stop_sequence;
    

    Or, for a prettier view (and to match the second query in your question):

    SELECT destination.trip_id, origin_stop.name, destination_stop.name
        FROM stop_times AS origin
        INNER JOIN stop_times AS destination
            ON destination.trip_id = origin.trip_id
            AND destination.stop_id = 1540
        INNER JOIN stops AS origin_stop
            ON origin_stop.id = origin.stop_id
        INNER JOIN stops AS destination_stop
            ON destination_stop.id = destination.stop_id
        WHERE origin.stop_id = 1538
            AND origin.stop_sequence < destination.stop_sequence;
    

    For good performance, create an index first on stop_id and trip_id:

    CREATE INDEX stop_times_stop_id_trip_id_index ON stop_times(stop_id, trip_id);
    

    (Note that EternalHour's query identifies all the trips that pass through either stop, not only trips that pass through one first and then the other.)