I am trying to extract data from Philadelphia's public transit API using the httr2 package. I load the packages and call the API in the code below
library(tidyverse)
library(httr2)
septaURL <- 'https://www3.septa.org/api/TransitViewAll'
response <- request(septaURL) %>%
req_perform()
It returns a bunch of nested lists. I use the pluck function to drill down to a list of 128 items (the number of bus routes), which are lists of varying length (depending on how many buses are driving that route), which themselves are lists of 19 items (the data I want to retrieve).
response %>%
resp_body_json() %>%
pluck('routes', 1) %>%
glimpse()
However when I pipe in the map function to extract the data, I get a tibble of 0 columns, 0 rows.
response %>%
resp_body_json() %>%
pluck('routes', 1) %>%
#map_dfr(
\(x) {
tibble(
Route = x %>% pluck('route_id'),
Trip = x %>% pluck('trip'),
Late = x %>% pluck('late'),
seatsAvailable = x %>% pluck('estimated_seat_availability')
)
}
) %>%
glimpse()
If I add an additional argument to the pluck function, it gives me the data I want for a single route in a tibble. But I want the data from all 128 routes.
response %>%
resp_body_json() %>%
pluck('routes', 1, 1) %>% #Added an argument here
#map_dfr(
\(x) {
tibble(
Route = x %>% pluck('route_id'),
Trip = x %>% pluck('trip'),
Late = x %>% pluck('late'),
seatsAvailable = x %>% pluck('estimated_seat_availability')
)
}
) %>%
glimpse()
How do I get a tibble like the one produced above but for all routes?
There are many ways to do this:
I think this is the simplest solution. This isn't an overly large data set, so make your life easier and create a large data frame and select the columns.
response %>%
resp_body_json() %>%
pluck('routes', 1) %>%
map_dfr(bind_rows) %>%
select(Route = route_id, Trip = trip, Late = late, seatsAvailable = estimated_seat_availability)
map_*
variantresponse %>%
resp_body_json() %>%
pluck('routes', 1) %>%
map_dfr(
\(x) {
tibble(
Route = map_chr(x, "route_id"),
Trip = map_chr(x, "trip"),
Late = map_int(x, "late"),
seatsAvailable = map_chr(x, "estimated_seat_availability")
)
}
)
_chr
will return a character vector, _int
will return an integer vector and so forth. map
functions are often used to apply a function over a list/vector, but they can also be used as selectors, which is the case here.
map_depth()
If you needed to iterate at a depth other than 1 you could use this:
response %>%
resp_body_json() %>%
pluck('routes', 1) %>%
map_dfr(
\(x) {
tibble(
Route = unlist(map_depth(x, 1, "route_id")),
Trip = unlist(map_depth(x, 1, "trip")),
Late = unlist(map_depth(x, 1, "late")),
seatsAvailable = unlist(map_depth(x, 1, "estimated_seat_availability"))
)
}
)
Since map_depth()
will return a list, we use unlist()
to get a vector (technically it will be a named vector but you can remove the names by specifying unlist(map_depth(...), use.names = F)
).
Also, since we are talking about Septa, I can save you some time pulling the Late
column. Just always assume it is late :D.
Output
Route Trip Late seatsAvailable
<chr> <chr> <int> <chr>
1 1 719611 10 NOT_AVAILABLE
2 1 719643 0 NOT_AVAILABLE
3 1 719622 7 NOT_AVAILABLE
4 1 719642 7 NOT_AVAILABLE
5 1 719645 16 MANY_SEATS_AVAILABLE
6 1 719624 -1 MANY_SEATS_AVAILABLE
7 1 719612 998 NOT_AVAILABLE
8 1 719607 0 NOT_AVAILABLE
9 10 777333 -1 NOT_AVAILABLE
10 10 777498 998 NOT_AVAILABLE
# ℹ 888 more rows