Search code examples
sqldatabasegtfstransitrevenue

GTFS Database - SQL Queries for "Revenue Mileage" and "Revenue Hours"


I'm trying to find the number of revenue miles/kilometers of a "Route" by: day, month, and year; by querying a GTFS database that has the structure described here:

https://developers.google.com/transit/gtfs/reference

And see here for a very clear sketch of the structure:

http://blog.openplans.org/2012/08/the-openplans-guide-to-gtfs-data/

"Revenue distance traveled" definition:

("Available for passengers to use" distance)

The number of miles/kilometers traveled from the first actual bus stop where a passenger can board, to the last drop-off at the last bus stop, for that particular route and bus run. (then aggregated together for all service runs taken by all buses for that particular route)

-

"Revenue hours" definition:

("Available for passengers to use" time span)

The number of hours from the moment the vehicle arrives at the first bus stop, until the moment it drops off its last passenger at the last bus stop. (then aggregated together for all service runs taken by all buses for that particular route)

I'm using SQL Server/MSSQL. Though SQL Lite, or MySQL, or any SQL examples would be perfectly fine.

Basically, I need to be able to SELECT a route, and then correlate the data in the routes, calendar_dates, calendar, stop-times, stops, and trips tables to find how many miles/kilometers were covered from the first stop (stop_times and stops tables) to the last, how many hours elapsed, and find this for a particular service_id (in trips and calendar tables), and then also for all service_ids for a particular route, and be able to get all this for a particular date (in calendar_dates table), or spans of dates (day, month, 3-month period, year, etc).

If a few different queries are needed, that's fine. The revenue distance traveled per route, and the revenue hours per route, can be separate queries.

Has anyone who has done this before be willing to share their query structure for this or has anyone figured this out? Are there any examples of how to write this query? I've been looking everywhere online for weeks.

Here is a diagram image of the database I have created with all relationships shown in detail:

GTFS Database Diagram


Solution

  • I have done exactly this for scheduled kms, by:

    1. Loading GTFS into a DB via GTFS SQL importer and PostGIS
    2. Making the shape table spatial
    3. Calculate distance for each shape
    4. Aggregate as below (see note on service id).

    select t.route_id as id, r.route_short_name as route, sum(l.shape_dist/1000) as sched_kms 
    from gtfs_shape_lengths l
    
    inner join gtfs_trips t on t.shape_id = l.shape_id
    inner join gtfs_routes r on r.route_id = t.route_id
    inner join gtfs_calendar c on t.service_id = c.service_id
    
    where c.service_id ilike '%sat%'
    
    group by t.route_id, r.route_short_name
    
    union all
    
    select 'total' as id, 'total_' as name,
    sum(l.shape_dist/1000) as sched_kms
    
    from gtfs_shape_lengths l
    
    inner join gtfs_trips t on t.shape_id = l.shape_id
    inner join gtfs_calendar c on t.service_id = c.service_id
    
    where c.service_id ilike '%sat%'
    
    order by sched_kms desc
    

    Original writeup here: http://transitdata.net/using-gtfs-and-postgis-to-calculate-levels-of-scheduled-service/