Search code examples
ropenstreetmapr-sfoverpass-api

R how can I get only ways from Overpass API and reduce the amount of data


Im trying to reduce the amount of data and time a query to overpass server takes. Im only interested in ways and by using osmdata Package this is my current approach:

library(osmdata)

bbox_dimensions <-c(xmin=11.2360151977671, ymin= 47.8047832575026, xmax= 11.8886729361838, ymax=48.2426118570748)

my_osm_data <- opq(bbox = bbox_dimensions,timeout = 180,memsize = 104857600) %>%
    add_osm_feature(
      key = 'highway', 
     value = c("primary","secondary", "tertiary")
      
    ) %>% 
  osmdata_sf(quiet = FALSE)

Is it possible to reduce the amount of data of this query? Im only interested in way not nodes along the way.


Solution

  • As I wrote in the comment, I would suggest the following approach if you need to run several queries for OSM data that belong to the same geographical area.

    First of all, load packages

    library(sf)
    #> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
    library(osmextract)
    #> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright.
    #> Check the package website, https://docs.ropensci.org/osmextract/, for more details.
    library(tmap)
    tmap_mode("view")
    #> tmap mode set to interactive viewing
    

    Define the bbox and convert to sfc object (see the discussion on github):

    my_bbox <- st_bbox(
      c(xmin = 11.2360151977671, ymin = 47.8047832575026, xmax = 11.8886729361838, ymax = 48.2426118570748), 
      crs = 4326
    )
    my_bbox_poly <- st_as_sfc(my_bbox)
    

    Then we need to download OSM extract for a particular geographical area that should cover all your queries. If you are working with data in Germany, then I would suggest checking the geofabrik and bbbike providers:

    oe_match(my_bbox_poly, provider = "geofabrik")
    #> The input place was matched with multiple geographical areas.
    #> Selecting the smallest administrative unit. Check ?oe_match for more details.
    #> $url
    #> [1] "https://download.geofabrik.de/europe/germany/bayern/oberbayern-latest.osm.pbf"
    #> 
    #> $file_size
    #> [1] 185338670
    oe_match(my_bbox_poly, provider = "bbbike")
    #> $url
    #> [1] "https://download.bbbike.org/osm/bbbike/Muenchen/Muenchen.osm.pbf"
    #> 
    #> $file_size
    #> [1] 58400897
    

    The extract returned by the bbbike provider is much smaller than the extract returned by geofabrik; hence I will run the following steps using the OSM data returned by bbbike.

    oe_get("Muenchen", provider = "bbbike", download_only = TRUE, skip_vectortranslate = TRUE)
    #> The input place was matched with: Muenchen
    #> File downloaded!
    #> [1] "C:\\Users\\Utente\\Documents\\osm-data\\bbbike_Muenchen.osm.pbf"
    

    Then, if you want to read-in the lines data that belong to a particular bbox and with certain characteristics, then I would suggest the following approach:

    lines_v1 <- oe_get(
      place = "Muenchen", # or place = my_bbox_poly
      layer = "lines", 
      provider = "bbbike", 
      query = "SELECT * FROM lines WHERE highway IN ('primary', 'secondary', 'tertiary')", 
      wkt_filter = st_as_text(my_bbox_poly) 
    )
    #> The input place was matched with: Muenchen
    #> The chosen file was already detected in the download directory. Skip downloading.
    #> Start with the vectortranslate operations on the input file!
    #> 0...10...20...30...40...50...60...70...80...90...100 - done.
    #> Finished the vectortranslate operations on the input file!
    #> Reading layer `lines' from data source `C:\Users\Utente\Documents\osm-data\bbbike_Muenchen.gpkg' using driver `GPKG'
    #> Simple feature collection with 13032 features and 9 fields
    #> Geometry type: LINESTRING
    #> Dimension:     XY
    #> Bounding box:  xmin: 11.19608 ymin: 47.80002 xmax: 11.89542 ymax: 48.25359
    #> Geodetic CRS:  WGS 84
    

    Please note that the function recognises that you have already downloaded the OSM extract and skips downloading the same file again. This process can be optimised if you set a persistent download directory. See here for more details.

    # Check result
    tm_shape(my_bbox_poly) + 
      tm_borders(col = "darkred") + 
    tm_shape(lines_v1) + 
      tm_lines(lwd = 2)
    

    A more efficient (but much more tricky) approach is the following:

    lines_v2 <- oe_get(
      place = "Muenches", 
      layer = "lines", 
      provider = "bbbike", 
      vectortranslate_options = c(
        "-f", "GPKG", 
        "-overwrite", 
        "-where", "highway IN ('primary', 'secondary', 'tertiary')", 
        "-clipsrc", st_as_text(my_bbox_poly), 
        "-nlt", "PROMOTE_TO_MULTI",
        "lines"
      )
    )
    #> The input place was matched with: Muenchen
    #> The chosen file was already detected in the download directory. Skip downloading.
    #> Start with the vectortranslate operations on the input file!
    #> 0...10...20...30...40...50...60...70...80...90...100 - done.
    #> Finished the vectortranslate operations on the input file!
    #> Reading layer `lines' from data source `C:\Users\Utente\Documents\osm-data\bbbike_Muenchen.gpkg' using driver `GPKG'
    #> Simple feature collection with 13027 features and 9 fields
    #> Geometry type: MULTILINESTRING
    #> Dimension:     XY
    #> Bounding box:  xmin: 11.23602 ymin: 47.80478 xmax: 11.88867 ymax: 48.24261
    #> Geodetic CRS:  WGS 84
    

    Graphical check

    # Check result
    tm_shape(my_bbox_poly) + 
      tm_borders(col = "darkred") + 
    tm_shape(lines_v2) + 
      tm_lines(lwd = 2)
    

    Created on 2021-03-31 by the reprex package (v1.0.0)

    Summary:

    1. If you need to import OSM data several times, then you should a persistent download directory. That also implies that you don't need to download an OSM extract every time you run a new query (unless the requested data are not included in any of the existing extracts).
    2. If you need to import OSM lines covering a medium/small geographical region, I would suggest adopting the "query" approach (i.e. lines_v1).
    3. The second approach has several benefits (i.e. it's faster than the other one, especially for larger extracts, and, as you can see from the previous plot, it clips the lines instead of selecting the roads that intersect the box). On the other hand, it's quite difficult to write the vectortranslate options from scratch (we are working on a more intuitive API but it's just under development for the moment). Moreover, that option modifies the underlying structure of the .gpkg file (which may have relevant consequences). We are working on a solution for both problems, but you need to wait until version 0.3 or 0.4.

    Check here, here, and here for more details behind osmextract.

    Feel free to add here any question or comment.