Search code examples

Why is GTFS so difficult?

I'm trying to create an Arduino-based display that shows the time of the next few trains at a Metro North stop in NYC. Metro North is run by the MTA (MTA is the NYC subway and bus agency) and I had no trouble creating a similar display for MTA busses. But, the train data is very difficult. It can't be filtered via GET commands in the URL. I have to parse the data myself to get stop-level data. (it's organized by train.)

I have webpage where I can run simple PHP, java etc. I think I need to create a page that will parse the "json"/"XML" file from the MTA for me. Then the Arduino can use the simplified data. I can get real-time data using a URL and my api key. This is what it looks like:


Like drinking from a firehose.

It's in a format called GTFS which is supposed to be "universal" but I'm finding it difficult to find anything that works with it. For example, simplexml_load_file() in PHP won't work.

Arduinos are not good for parsing text. I can't force the Arduino to do all of the work.

What method should I learn? I do not have much control over my server. I can't change the way PHP runs on my server easily. The methods I've found so far seem to require extensions and other things I just can't do.


  • Yes, "drinking from a firehose" is a good metaphor for working with GTFS data. One aspect of GTFS is to put the least load on the transit agency's system and have the app developer's system (you) do as much of the work as possible. Both pieces (the "static" GTFS file with the current route information and the GTFS Realtime files) can be static data files served from a "dumb" web server (no web application code required) which makes serving the data very scalable.

    Some transit agencies (like MTA, in the case of their bus routes) are nice and do some of the work for you, providing XML or JSON API's to give you current realtime status on specific routes. But, if you learn how to process GTFS data on your end, you don't need to rely on their API and your system becomes "universal", able to deal with any agency serving GTFS realtime information.

    My guess is you have already looked at the docs but, just in case, here's the reference:

    Ordinarily, you would use the combination of the GTFS file (containing the calendar, routes, trips, stops, and stop_times) and the GTFS realtime files (specifically, the tripUpdate file) to determine what the current status of a vehicle was in relation to the stops on its current trip. They might just give you the vehicles' delay times on the their current trips and then you would need to look up the stop time for each trip for the target stop and adjust it by the corresponding delay.

    However, MTA is being nice again and they are giving you enough information just in the realtime feed to show when a vehicle (e.g. train 1586) is going to arrive at a stop (e.g. stop 144) at a time (e.g. unix time 1442890020 = Tue, 22 Sep 2015 02:47:00 GMT) and a delay (e.g. 60 seconds).

    So, if your PHP process is trying to create a list of upcoming departure times for a target stop, you would:

    1. parse the realtime feed into an object collection (I'm assuming you can do that in PHP, as Twisty mentioned above to use json_decode())
    2. iterate through the array of "trip_update" objects
    3. under the "stop_time_update" object, iterate through the array of "departure" objects
    4. for each "departure" with a "stop_id" that matches the stop you want your device to display information for, add its "time" value (unix time) to a collection.
    5. sort that resulting collection and convert from unix time to whatever PHP's local datetime object is and format it as text
    6. send the next two departure times to your device

    That would be a fast-track solution for you for this single case.