Search code examples
mysqlgtfs

How can I rearrange the table in MySQL?


There is a table stop_times.txt where its format (GTFS) is something like:

+------------------+---------------+
|     trip_id      | stop_sequence |
+------------------+---------------+
| 4503599630773892 |      0        |
| 4503599630773892 |      1        |
|       ...        |      ...      |
| 4503599630773892 |      27       |
| 4503599630810392 |      0        |
| 4503599630810392 |      1        |
|       ...        |      ...      |
| 4503599630810392 |      17       |
| 4503599631507892 |      0        |
| 4503599631507892 |      1        |
|       ...        |      ...      |
| 4503599631507892 |      29       |
|       ...        |      ...      |
+------------------+---------------+

My expecting result is:

+------------------+------------+-----------+
|     trip_id      | first_stop | last_stop |
+------------------+------------+-----------+
| 4503599630773892 |     0      |    27     |
| 4503599630810392 |     0      |    17     |
| 4503599631507892 |     0      |    19     |
|       ...        |    ...     |    ...    |
+------------------+------------+-----------+

PS: The title might not be precise. Please refine it.


One further question: how can I add stop_name that corresponds to stop_sequence to this table?

enter image description here

Here is the incorrect code for the reason that the stop name of first_stop and last_stop should be different as corresponding to the different stop_id:

(SELECT routes.route_short_name, MIN(stop_times.stop_sequence) AS first_stop, stops.stop_name, MAX(stop_times.stop_sequence) AS last_stop, stops.stop_name
FROM stop_times
JOIN stops ON stops.stop_id=stop_times.stop_id
JOIN trips ON stop_times.trip_id=trips.trip_id 
JOIN routes ON routes.route_id=trips.route_id 
GROUP BY stop_times.trip_id);

EDIT: I make it after several hours' work. Here is the key source code:

SELECT T1.trip_id, T1.stop_sequence, T1.stop_id, T2.stop_sequence, T2.stop_id
FROM
    -- create a new table T1: trip_id, stop_sequence=0, stop_id (first stop)
    (SELECT st_first1.trip_id, st_first1.stop_sequence, st_first1.stop_id
    FROM stop_times st_first1
    INNER JOIN 
        -- filter out the first stop: trip_id, stop_sequence=0
        (SELECT stop_times.trip_id, MIN(CAST(stop_times.stop_sequence AS UNSIGNED)) AS first_stop
        FROM stop_times
        GROUP BY stop_times.trip_id
        ) st_first2
    ON st_first1.trip_id=st_first2.trip_id AND st_first1.stop_sequence=st_first2.first_stop
    ) T1

LEFT JOIN -- combine T1 and T2

    -- create a new table T2: trip_id, stop_sequence=MAX, stop_id (last stop)
    (SELECT st_last1.trip_id, st_last1.stop_sequence, st_last1.stop_id
    FROM stop_times st_last1
    INNER JOIN
        -- filter out the last stop: trip_id, stop_sequence=MAX
        (SELECT stop_times.trip_id, MAX(CAST(stop_times.stop_sequence AS UNSIGNED)) AS last_stop
        FROM stop_times
        GROUP BY stop_times.trip_id
        ) st_last2
    ON st_last1.trip_id=st_last2.trip_id AND st_last1.stop_sequence=st_last2.last_stop
    ) T2

ON T1.trip_id=T2.trip_id

Solution

  • You can GROUP BY the trip_id and then take the MIN and MAX stop_sequence values to obtain the first and last stops, respectively.

    SELECT DISTINCT st.trip_id, s.stop_name, t.first_stop, t.last_stop
    FROM stop_times st INNER JOIN stops s
    ON st.stop_id = s.stop_id
    RIGHT JOIN
    (
        SELECT trip_id, MIN(stop_sequence) AS first_stop, MAX(stop_sequence) AS last_stop
        FROM stop_times
        GROUP BY trip_id
    ) t
    ON t.trip_id = st.trip_id