Is it possible to reduce the run time of the following code?
My goal is to get an weighted igraph object from open street data area specified by a box boundary.
Currently im trying to use the overpass api so to offload the memory load so I dont have to keep big osm files in memory.
First I get a osm data specified by a bbox (only streets) as an xml structure
library(osmdata)
library(osmar)
install.packages("remotes")
remotes::install_github("hypertidy/scgraph")
library(scgraph)
dat <- opq(bbox = c(11.68771, 47.75233, 12.35058, 48.19743 )) %>%
add_osm_feature(key = 'highway',value = c("trunk", "trunk_link", "primary","primary_link", "secondary", "secondary_link", "tertiary","tertiary_link", "residential", "unclassified" ))%>%
osmdata_xml ()
Then I convert the resulting xml object dat to an osmar object dat_osmar and finally to an igraph object:
dat_osmar <-as_osmar(xmlParse(dat))
dat_graoh <- as_igraph(dat_osmar)
How could I optimize these routines?
Maybe it is possible to separate dat (XML) object in chunks and parse it in parallell?
I go through several steps only to finally to get to a weighted non directed graph.
Currently the whole process takes 89.555 sec on my machine.
If I could cut the running time of these two stepps:
dat_osmar <-as_osmar(xmlParse(dat))
dat_graoh <- as_igraph(dat_osmar)
that would help already.
One of the approaches I tried is to use osmdata_sc() instead of osmdata_xml().
This provides an silicate object and I can convert it with:
scgraph::sc_as_igraph(dat)
to an igraph.
It is decently fast but sadly the weights are getting lost so its not a solution.
The reason for it is: if I use the conversion from osmar object to an igraph object with the function osmar::as_igraph()
the weight is calculated based on the distances between two edges and added to the igraph:
edges <- lapply(dat, function(x) {
n <- nrow(x)
from <- 1:(n - 1)
to <- 2:n
weights <- distHaversine(x[from, c("lon", "lat")], x[to,
c("lon", "lat")])
cbind(from_node_id = x[from, "ref"], to_node_id = x[to,
"ref"], way_id = x[1, "id"], weights = weights)
})
This is missing from scgraph::sc_as_igraph(dat)
If this could be added to silicate to igraph conversion
I could skip the dat_osmar <-as_osmar(xmlParse(dat))
step
and go overpass->silicate->igraph
route which is much faster istead of overpass->xml->osmar->igraph
.
osmdata package also provides a sf response with osmdata_sf()
so maybe the workflow overpass->sf->igraph
is faster but also while using this way I would need the weights incorporated into the graph based on the distance of edges and im not good enough to do it currently and would realy appreciate any help.
Furthermore the connection between openstreetmap gps points and their IDs should not be lost while using sf and resulting igraph object. Meaning I should be able to find gps position to an ID from the resulting Igraph. A lookup table would be enough. If i go overpass->silicate->igraph
or overpass->xml->osmar->igraph
routes it is possible. Im not sure if it is still would be posible with overpass->sf->igraph
route.
If you want to create a graph object starting from a network of roads in R, then I would use the following procedure.
First of all, I need to install sfnetworks
from the github repo (since we recently fixed some bugs and the newest version is not on CRAN)
remotes::install_github("luukvdmeer/sfnetworks", quiet = TRUE)
Then load packages
library(sf)
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(tidygraph)
#>
#> Attaching package: 'tidygraph'
#> The following object is masked from 'package:stats':
#>
#> filter
library(sfnetworks)
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
Download data from Overpass API
my_osm_data <- opq(bbox = c(11.68771, 47.75233, 12.35058, 48.19743 )) %>%
add_osm_feature(
key = 'highway',
value = c("trunk", "trunk_link", "primary","primary_link", "secondary", "secondary_link", "tertiary","tertiary_link", "residential", "unclassified")
) %>%
osmdata_sf(quiet = FALSE)
#> Issuing query to Overpass API ...
#> Rate limit: 2
#> Query complete!
#> converting OSM data to sf format
Now I extract the roads and build the sfnetwork object:
system.time({
# extract the roads
my_roads <- st_geometry(my_osm_data$osm_lines)
# build the sfnetwork object
my_sfn <- as_sfnetwork(my_roads, directed = FALSE, length_as_weight = TRUE)
})
#> user system elapsed
#> 3.03 0.16 3.28
As you can see, after downloading the OSM data, it takes just a couple of seconds to run that procedure.
At the moment I ignore all fields in my_osm_data$osm_lines
, but if you need to add some of the columns in my_osm_data$osm_lines
to my_roads
, then you can modify the previous code as follows: my_roads <- my_osm_data$osm_lines[, "relevant columns"]
.
A few details regarding the construction of the sfnetwork
object: the parameter "directed = FALSE"
specifies that we want to build an undirected graph (see the docs, here and here for more details), while the parameter length_as_weight = TRUE
says that the length of the edges will be stored in a column called "weight"
and used by igraph and tidygraph algorithms.
This is the printing of my_sfn
object:
my_sfn
#> # A sfnetwork with 33179 nodes and 28439 edges
#> #
#> # CRS: EPSG:4326
#> #
#> # An undirected multigraph with 6312 components with spatially explicit edges
#> #
#> Registered S3 method overwritten by 'cli':
#> method from
#> print.boxx spatstat.geom
#> # Node Data: 33,179 x 1 (active)
#> # Geometry type: POINT
#> # Dimension: XY
#> # Bounding box: xmin: 11.6757 ymin: 47.74745 xmax: 12.39161 ymax: 48.22025
#> x
#> <POINT [°]>
#> 1 (11.68861 47.90971)
#> 2 (11.68454 47.90937)
#> 3 (11.75216 48.17638)
#> 4 (11.75358 48.17438)
#> 5 (11.7528 48.17351)
#> 6 (11.74822 48.17286)
#> # ... with 33,173 more rows
#> #
#> # Edge Data: 28,439 x 4
#> # Geometry type: LINESTRING
#> # Dimension: XY
#> # Bounding box: xmin: 11.6757 ymin: 47.74745 xmax: 12.39161 ymax: 48.22025
#> from to x weight
#> <int> <int> <LINESTRING [°]> <dbl>
#> 1 1 2 (11.68861 47.90971, 11.6878 47.90965, 11.68653 47.90954, 1~ 306.
#> 2 3 4 (11.75216 48.17638, 11.75224 48.17626, 11.75272 48.17556, ~ 246.
#> 3 5 6 (11.7528 48.17351, 11.75264 48.17344, 11.75227 48.17329, 1~ 382.
#> # ... with 28,436 more rows
my_sfn
is an igraph object by definition:
class(my_sfn)
#> [1] "sfnetwork" "tbl_graph" "igraph"
but, if you want to be more explicit, then
as.igraph(my_sfn)
#> IGRAPH 101dcdf U-W- 33179 28439 --
#> + attr: x (v/x), x (e/x), weight (e/n)
#> + edges from 101dcdf:
#> [1] 1-- 2 3-- 4 5-- 6 7-- 8 9-- 10 11-- 12 13-- 14 15-- 16
#> [9] 17-- 18 16-- 19 20-- 21 21-- 22 23-- 24 25-- 26 27-- 28 29-- 30
#> [17] 31-- 32 33-- 34 35-- 36 37-- 38 39-- 40 41-- 42 43-- 44 45-- 46
#> [25] 14-- 47 48-- 49 50-- 51 52-- 53 54-- 55 56-- 57 36-- 58 58-- 59
#> [33] 60-- 61 62-- 63 64-- 65 66-- 67 68-- 69 70-- 71 72-- 73 74-- 75
#> [41] 76-- 77 78-- 79 80-- 81 82-- 83 84-- 85 86-- 87 88-- 89 90-- 91
#> [49] 92-- 93 94-- 95 96-- 97 98-- 99 100--101 102--103 104--105 106--107
#> [57] 108--109 110--111 112--113 80--114 115--116 117--118 119--120 121--122
#> + ... omitted several edges
You can see that the edges have a weight attribute that is equal to the length of each LINESTRING geometry:
all.equal(
target = igraph::edge_attr(as.igraph(my_sfn), "weight"),
current = as.numeric(st_length(my_roads))
)
#> [1] TRUE
Created on 2021-03-26 by the reprex package (v1.0.0)
If you want to read more details regarding sfnetworks
, then you can check the website and the introductory vignettes. That being said, I don't understand what you mean by
connection between openstreetmap gps points and their IDs should not be lost
Can you add more details with a comment or an edit to the original question? Why do you need to OSM id? And what do you mean by OSM id? I think that I need more details to expand this answer.
EDIT
I just re-read @mrhellmann 's answer, and I noticed that I forgot to convert POLYGON data to lines. Anyway, I would suggest applying osmdata::osm_poly2line()
immediately after running the code to download OSM data via Overpass API.