R how can I get only ways from Overpass API and reduce the amount of data

Im trying to reduce the amount of data and time a query to overpass server takes. Im only interested in ways and by using osmdata Package this is my current approach:

library(osmdata)

bbox_dimensions <-c(xmin=11.2360151977671, ymin= 47.8047832575026, xmax= 11.8886729361838, ymax=48.2426118570748)

my_osm_data <- opq(bbox = bbox_dimensions,timeout = 180,memsize = 104857600) %>%
    add_osm_feature(
      key = 'highway', 
     value = c("primary","secondary", "tertiary")
      
    ) %>% 
  osmdata_sf(quiet = FALSE)

Is it possible to reduce the amount of data of this query? Im only interested in way not nodes along the way.

Solution

As I wrote in the comment, I would suggest the following approach if you need to run several queries for OSM data that belong to the same geographical area.

First of all, load packages

library(sf)
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(osmextract)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright.
#> Check the package website, https://docs.ropensci.org/osmextract/, for more details.
library(tmap)
tmap_mode("view")
#> tmap mode set to interactive viewing

Define the bbox and convert to sfc object (see the discussion on github):

my_bbox <- st_bbox(
  c(xmin = 11.2360151977671, ymin = 47.8047832575026, xmax = 11.8886729361838, ymax = 48.2426118570748), 
  crs = 4326
)
my_bbox_poly <- st_as_sfc(my_bbox)

Then we need to download OSM extract for a particular geographical area that should cover all your queries. If you are working with data in Germany, then I would suggest checking the geofabrik and bbbike providers:

oe_match(my_bbox_poly, provider = "geofabrik")
#> The input place was matched with multiple geographical areas.
#> Selecting the smallest administrative unit. Check ?oe_match for more details.
#> $url
#> [1] "https://download.geofabrik.de/europe/germany/bayern/oberbayern-latest.osm.pbf"
#> 
#> $file_size
#> [1] 185338670
oe_match(my_bbox_poly, provider = "bbbike")
#> $url
#> [1] "https://download.bbbike.org/osm/bbbike/Muenchen/Muenchen.osm.pbf"
#> 
#> $file_size
#> [1] 58400897

The extract returned by the bbbike provider is much smaller than the extract returned by geofabrik; hence I will run the following steps using the OSM data returned by bbbike.

oe_get("Muenchen", provider = "bbbike", download_only = TRUE, skip_vectortranslate = TRUE)
#> The input place was matched with: Muenchen
#> File downloaded!
#> [1] "C:\\Users\\Utente\\Documents\\osm-data\\bbbike_Muenchen.osm.pbf"

Then, if you want to read-in the lines data that belong to a particular bbox and with certain characteristics, then I would suggest the following approach:

lines_v1 <- oe_get(
  place = "Muenchen", # or place = my_bbox_poly
  layer = "lines", 
  provider = "bbbike", 
  query = "SELECT * FROM lines WHERE highway IN ('primary', 'secondary', 'tertiary')", 
  wkt_filter = st_as_text(my_bbox_poly) 
)
#> The input place was matched with: Muenchen
#> The chosen file was already detected in the download directory. Skip downloading.
#> Start with the vectortranslate operations on the input file!
#> 0...10...20...30...40...50...60...70...80...90...100 - done.
#> Finished the vectortranslate operations on the input file!
#> Reading layer `lines' from data source `C:\Users\Utente\Documents\osm-data\bbbike_Muenchen.gpkg' using driver `GPKG'
#> Simple feature collection with 13032 features and 9 fields
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: 11.19608 ymin: 47.80002 xmax: 11.89542 ymax: 48.25359
#> Geodetic CRS:  WGS 84

Please note that the function recognises that you have already downloaded the OSM extract and skips downloading the same file again. This process can be optimised if you set a persistent download directory. See here for more details.

# Check result
tm_shape(my_bbox_poly) + 
  tm_borders(col = "darkred") + 
tm_shape(lines_v1) + 
  tm_lines(lwd = 2)

A more efficient (but much more tricky) approach is the following:

lines_v2 <- oe_get(
  place = "Muenches", 
  layer = "lines", 
  provider = "bbbike", 
  vectortranslate_options = c(
    "-f", "GPKG", 
    "-overwrite", 
    "-where", "highway IN ('primary', 'secondary', 'tertiary')", 
    "-clipsrc", st_as_text(my_bbox_poly), 
    "-nlt", "PROMOTE_TO_MULTI",
    "lines"
  )
)
#> The input place was matched with: Muenchen
#> The chosen file was already detected in the download directory. Skip downloading.
#> Start with the vectortranslate operations on the input file!
#> 0...10...20...30...40...50...60...70...80...90...100 - done.
#> Finished the vectortranslate operations on the input file!
#> Reading layer `lines' from data source `C:\Users\Utente\Documents\osm-data\bbbike_Muenchen.gpkg' using driver `GPKG'
#> Simple feature collection with 13027 features and 9 fields
#> Geometry type: MULTILINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: 11.23602 ymin: 47.80478 xmax: 11.88867 ymax: 48.24261
#> Geodetic CRS:  WGS 84

Graphical check

# Check result
tm_shape(my_bbox_poly) + 
  tm_borders(col = "darkred") + 
tm_shape(lines_v2) + 
  tm_lines(lwd = 2)

^{Created on 2021-03-31 by the reprex package (v1.0.0)}

Summary:

If you need to import OSM data several times, then you should a persistent download directory. That also implies that you don't need to download an OSM extract every time you run a new query (unless the requested data are not included in any of the existing extracts).
If you need to import OSM lines covering a medium/small geographical region, I would suggest adopting the "query" approach (i.e. lines_v1).
The second approach has several benefits (i.e. it's faster than the other one, especially for larger extracts, and, as you can see from the previous plot, it clips the lines instead of selecting the roads that intersect the box). On the other hand, it's quite difficult to write the vectortranslate options from scratch (we are working on a more intuitive API but it's just under development for the moment). Moreover, that option modifies the underlying structure of the .gpkg file (which may have relevant consequences). We are working on a solution for both problems, but you need to wait until version 0.3 or 0.4.

Check here, here, and here for more details behind osmextract.

Feel free to add here any question or comment.