I am trying to parse GTFS data and build a polyline shape (an array of latitude and longitude pairs) for a single route. But in my sample GTFS data I found that a trip has multiple shape IDs for a single route. Here is a passage from GTFS data:
route_id,agency_id,route_short_name,route_long_name,route_desc,route_type,route_url,route_color,route_text_color 90,YRT,90,LESLIE,,3,,FDAE35,FFFFFF
route_id,service_id,trip_id,trip_headsign,trip_short_name,direction_id,block_id,shape_id,wheelchair_accessible,bikes_allowed 90,1,1286467,Richmond Green Secondary School - NB,,0,131905,59628,1,1 90,1,1286468,Richmond Green Secondary School - NB,,0,131907,59628,1,1 90,1,1286380,Richmond Green Secondary School - NB,,0,131906,59629,1,1 90,1,1286469,Richmond Green Secondary School - NB,,0,131908,59628,1,1 90,1,1286381,Richmond Green Secondary School - NB,,0,131904,59629,1,1 90,1,1286382,Richmond Green Secondary School - NB,,0,131905,59629,1,1 ... 90,1,1286399,Richmond Green Secondary School - NB,,0,131960,59629,1,1 90,1,1286400,Richmond Green Secondary School - NB,,0,131961,59629,1,1 90,1,1286470,Richmond Green Secondary School - NB,,0,131921,59630,1,1 90,1,1286471,Richmond Green Secondary School - NB,,0,131922,59630,1,1 90,1,1286401,Richmond Green Secondary School - NB,,0,131962,59629,1,1 90,1,1286402,Richmond Green Secondary School - NB,,0,131960,59629,1,2
shape_id,shape_pt_lat,shape_pt_lon,shape_pt_sequence,shape_dist_traveled 59628,43.902752,-79.398992,72,7.2214 59628,43.902585,-79.399005,73,7.2405 59629,43.775996,-79.346326,1,0.0000 59629,43.775987,-79.346238,2,0.0071 ... 59629,43.902752,-79.398992,317,15.7832 59629,43.902585,-79.399005,318,15.8022 59630,43.811197,-79.360774,1,0.0000 59630,43.812373,-79.361259,2,0.1364
I was expecting one shape per trip or at least shapes are in sequential order. But this trip data is throwing me off:
route_id,service_id,trip_id,trip_headsign,trip_short_name,direction_id,block_id,shape_id,wheelchair_accessible,bikes_allowed 90,1,1286400,Richmond Green Secondary School - NB,,0,131961,59629,1,1 90,1,1286470,Richmond Green Secondary School - NB,,0,131921,59630,1,1 90,1,1286471,Richmond Green Secondary School - NB,,0,131922,59630,1,1 90,1,1286401,Richmond Green Secondary School - NB,,0,131962,59629,1,1
If you noticed, after shape #59629, #59630 is located. But after that we again see #59629. How can I make sense of this? Is it a data issue?
Shapes are not associated with routes, shapes are only associated with individual trips. It is quite common for a single route to encompass two or more shapes.
In fact, since shapes explicitly encode a direction of motion, there will always be at least 2 shapes for routes that are split into "there-and-back" trip pairs (which is the most common approach for simple bus routes in practice). More complex possibilities include routes with multiple branches, or routes with some short-turning trips.
Furthermore, there is no ordering implied by the shape IDs; i.e. there is no sense in which 59630
is "before" or "after" 59629
. In principle, these are arbitrary strings.
In short, the data you are working with looks fine, it's just that there is no unambiguous way to do what you want to do for the general case. However, depending on the particulars of your case, it may be possible to take a more manual approach and combine multiple shapes into a single coherent polyline.