I have a shapefile and I want to load only the observations that satisfy some condition (to avoid unnecessarily loading the whole dataset). To do it using R's sf
package, I have to pass to the query
parameter of the st_read
function an SQL query. But the query parameter seems not to accept an SQL function (such as SUBSTRING
or LEFT
) to be used with it.
To give a concrete example, I am using some congressional districts shapefiles. I tried, for instance, the following code:
library(sf)
query <- "SELECT * FROM \"districts080\" WHERE SUBSTRING(\"ID\", 2, 2) = '01'"
dfr <- st_read(file.path('C:', 'Downloads', 'districts080.shp'),
query = query)
But it did not work properly, as I got the following error message:
Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
Query execution failed, cannot open layer.
In addition: Warning message:
In CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
GDAL Error 1: Undefined function 'SUBSTRING' used.
My question is: Is the SQL query that we are allowed to pass to the query
parameter of the st_read
function limited to cases that do not use an SQL function in the query? If that is the case, how would I do to load only the desired subset of the observations?
In OGR SQL dialect it's SUBSTR()
, though I'd assume for that specific case something like ID LIKE '_01%'
would be a more common choice.
Using {sf}
example Shapefile:
library(sf)
#> Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
nc_shp <- system.file("shape/nc.shp", package="sf")
# SUBSTR(FIPS,2,3) = '700'
read_sf(nc_shp, query = "SELECT NAME, FIPS FROM \"nc\" WHERE SUBSTR(FIPS,2,3) = '700'")
#> Simple feature collection with 5 features and 2 fields
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -81.74107 ymin: 34.80792 xmax: -79.23799 ymax: 36.58965
#> Geodetic CRS: NAD27
#> # A tibble: 5 × 3
#> NAME FIPS geometry
#> <chr> <chr> <POLYGON [°]>
#> 1 Ashe 37009 ((-81.47276 36.23436, -81.54084 36.27251, -81.56198 36.27359,…
#> 2 Alleghany 37005 ((-81.23989 36.36536, -81.24069 36.37942, -81.26284 36.40504,…
#> 3 Alamance 37001 ((-79.24619 35.86815, -79.23799 35.83725, -79.54099 35.83699,…
#> 4 Alexander 37003 ((-81.10889 35.7719, -81.12728 35.78897, -81.1414 35.82332, -…
#> 5 Anson 37007 ((-79.91995 34.80792, -80.32528 34.81476, -80.27512 35.19311,…
# FIPS LIKE '_700%'
# `%`: any number of characters; `_`: any one character
read_sf(nc_shp, query = "SELECT NAME, FIPS FROM \"nc\" WHERE FIPS LIKE '_700%'")
#> Simple feature collection with 5 features and 2 fields
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -81.74107 ymin: 34.80792 xmax: -79.23799 ymax: 36.58965
#> Geodetic CRS: NAD27
#> # A tibble: 5 × 3
#> NAME FIPS geometry
#> <chr> <chr> <POLYGON [°]>
#> 1 Ashe 37009 ((-81.47276 36.23436, -81.54084 36.27251, -81.56198 36.27359,…
#> 2 Alleghany 37005 ((-81.23989 36.36536, -81.24069 36.37942, -81.26284 36.40504,…
#> 3 Alamance 37001 ((-79.24619 35.86815, -79.23799 35.83725, -79.54099 35.83699,…
#> 4 Alexander 37003 ((-81.10889 35.7719, -81.12728 35.78897, -81.1414 35.82332, -…
#> 5 Anson 37007 ((-79.91995 34.80792, -80.32528 34.81476, -80.27512 35.19311,…
Created on 2023-11-30 with reprex v2.0.2