Search code examples
rgtfs

Reading GTFS-realtime files using R?


I want to analyze GTFS-realtime files using R, compared to the static GTFS, these files are compiled and reading them is trickier.

Googling around, I have only found this package to deal with GTFS https://github.com/ropenscilabs/gtfsr

But again, this is just for static GTFS.

Are you aware of a cran/github R package that deals with GTFS-realtime?

An alternative solution would be to convert the GTFS-RT into a more readable format like json streaming gtfs real time data into human readable format


Solution

  • The GTFS realtime feeds are binary Protocol Buffers, that can be processed by the RProtoBuf package.

    A simple worked example using my local South-east Queensland Translink feed:

    library(RProtoBuf)
    

    Load the actual proto file which specifies the format the feed files actually follow:

    download.file(url="https://gtfsrt.api.translink.com.au/api/realtime/protobuf", destfile="translink-gtfs-realtime.proto")
    readProtoFiles("translink-gtfs-realtime.proto")
    

    Check all the 'Descriptors' that are now available for loading feeds in the 'Descriptor Pool'

    ls("RProtoBuf:DescriptorPool")
    ## [1] "GTFSv2.Realtime.Alert"             "GTFSv2.Realtime.EntitySelector"   
    ## [3] "GTFSv2.Realtime.FeedEntity"        "GTFSv2.Realtime.FeedHeader"       
    ## [5] "GTFSv2.Realtime.FeedMessage"       "GTFSv2.Realtime.Position"
    ## ...
    

    Read the actual feeds - stored in the 'FeedMessage'/'entity' in this case

    download.file(url="https://gtfsrt.api.translink.com.au/api/realtime/SEQ/TripUpdates", destfile="SEQ-TripUpdates.pb")
    download.file(url="https://gtfsrt.api.translink.com.au/api/realtime/SEQ/VehiclePositions", destfile="SEQ-VehiclePositions.pb")
    
    vehicle_position_feed <- read(GTFSv2.Realtime.FeedMessage,  "SEQ-VehiclePositions.pb")[["entity"]]
    trip_update_feed  <- read(GTFSv2.Realtime.FeedMessage,  "SEQ-TripUpdates.pb")[["entity"]]
    

    When read, each object is just a set of pointers to parts of the binary file:

    str(vehicle_position_feed)
    ##List of 6
    ## $ :Formal class 'Message' [package "RProtoBuf"] with 2 slots
    ##  .. ..@ pointer:<externalptr> 
    ##  .. ..@ type   : chr "GTFSv2.Realtime.FeedEntity"
    ## $ :Formal class 'Message' [package "RProtoBuf"] with 2 slots
    ##  .. ..@ pointer:<externalptr> 
    ##  .. ..@ type   : chr "GTFSv2.Realtime.FeedEntity"
    ## .. 
    

    You can then extract info from each data point by looping over the file to construct datasets to work with, e.g.:

    data.frame(
      id = sapply(vehicle_position_feed, \(x) x[["id"]] ),
      latitude = sapply(vehicle_position_feed, \(x) x[["vehicle"]][["position"]][["latitude"]] ),
      longitude = sapply(vehicle_position_feed, \(x) x[["vehicle"]][["position"]][["longitude"]] )
    )
    ##                   id  latitude longitude
    ##1     VU-2123549587_1 -27.06561  153.1595
    ##2    VU-1176076363_10 -27.30158  152.9881
    ##3   VU--1272517086_10 -27.49080  153.2397
    ## ...