I'm currently working with GTFS files from WMATA and Arlington Regional Transit in R. Purpose of using R is for Shiny; would've used Python otherwise. I've had no issue with the static files (currently using the tidytransit pkg for that) but I'm struggling with the realtime files as a new protocol buffer user (I spent hours trying to decode the binary output within R, if that's any indication).
I used httr2 to pull realtime trip updates from WMATA and stored the response in a .pb file. I read in the default GTFS proto file with readProtoFiles and the descriptors show up fine. I tried the following:
mb_trip <- read-methods("transit_realtime.TripUpdate", "update.pb")[["timestamp"]]
Error in .S3methods(generic.function, class, envir, dropPath = dropPath) : no function 'transit_realtime.TripUpdate' is visible
[EDIT: The below codeblock originally had the --decode_raw output. It now contains a portion of the decode output with proto structure as explained in the comments.]
header {
gtfs_realtime_version: "2.0"
incrementality: FULL_DATASET
timestamp: 1709699913
}
entity {
id: "23564020"
trip_update {
trip {
trip_id: "23564020"
start_date: "20240306"
route_id: "16A"
}
stop_time_update {
stop_sequence: 2
departure {
time: 1709701200
}
stop_id: "13524"
schedule_relationship: SCHEDULED
}
stop_time_update {
stop_sequence: 3
arrival {
time: 1709701392
}
stop_id: "3625"
schedule_relationship: SCHEDULED
}
stop_time_update {
stop_sequence: 4
arrival {
time: 1709701408
}
stop_id: "3641"
schedule_relationship: SCHEDULED
}
stop_time_update {
stop_sequence: 5
arrival {
time: 1709701448
}
stop_id: "3680"
schedule_relationship: SCHEDULED
}
stop_time_update {
stop_sequence: 6
arrival {
time: 1709701475
}
stop_id: "3696"
schedule_relationship: SCHEDULED
}
stop_time_update {
stop_sequence: 7
arrival {
time: 1709701557
}
stop_id: "3858"
schedule_relationship: SCHEDULED
}
stop_time_update {
stop_sequence: 8
arrival {
time: 1709701626
}
stop_id: "28172"
schedule_relationship: SCHEDULED
}
stop_time_update {
stop_sequence: 10
arrival {
time: 1709701729
}
stop_id: "3727"
schedule_relationship: SCHEDULED
}
stop_time_update {
stop_sequence: 11
arrival {
time: 1709701893
}
stop_id: "3581"
schedule_relationship: SCHEDULED
}
stop_time_update {
stop_sequence: 12
arrival {
time: 1709701957
}
stop_id: "3534"
schedule_relationship: SCHEDULED
}
stop_time_update {
stop_sequence: 13
arrival {
time: 1709702003
}
stop_id: "3507"
schedule_relationship: SCHEDULED
}
stop_time_update {
stop_sequence: 14
arrival {
time: 1709702101
}
stop_id: "3429"
schedule_relationship: SCHEDULED
}
stop_time_update {
stop_sequence: 15
arrival {
time: 1709702176
My understanding is that the numbers are supposed to match the keys in the static GTFS files (I read this somewhere and can't find the source now). The first bracket refers to the GTFS version (2.0), incrementality and timestamp. What you see after that is the trip update for Metrobus T18.
[Side note: I am aware there is a WMATA package for R, but my understanding is that the GTFS data is more up to date than the REST API. Also, GTFS is the only option for ART.]
I've looked at the documentation for proto structure. I followed the answer on this thread but wasn't able to fully replicate because of a 404 with the Sydney data (couldn't get the GTFS proto). Does pb data need to include the same headers as the proto? The decoded Sydney .pb file seemed to have the same structure (i.e. no headers, just IDs) as my WMATA data. Is there something I need to do with the API data before calling RProtoBuf::read-methods?
For some reason, using transit_realtime.TripUpdate$read("update.pb")
rather than read-methods("transit_realtime.TripUpdate", "update.pb")
works here. I am still trying to understand what's happening in the background (and tracing back yields nothing), and I've still got some ways to parse the data, but the original question I posed has been answered.