Search code examples
sqlrr-sf

Is it possible to pass an SQL query that contains an SQL function to the query parameter of the st_read function in R's sf package? What alternative?


I have a shapefile and I want to load only the observations that satisfy some condition (to avoid unnecessarily loading the whole dataset). To do it using R's sf package, I have to pass to the query parameter of the st_read function an SQL query. But the query parameter seems not to accept an SQL function (such as SUBSTRING or LEFT) to be used with it.

To give a concrete example, I am using some congressional districts shapefiles. I tried, for instance, the following code:

library(sf)

query <- "SELECT * FROM \"districts080\" WHERE SUBSTRING(\"ID\", 2, 2) = '01'"
dfr <- st_read(file.path('C:', 'Downloads', 'districts080.shp'), 
               query = query)

But it did not work properly, as I got the following error message:

Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  : 
  Query execution failed, cannot open layer.
In addition: Warning message:
In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
  GDAL Error 1: Undefined function 'SUBSTRING' used.

My question is: Is the SQL query that we are allowed to pass to the query parameter of the st_read function limited to cases that do not use an SQL function in the query? If that is the case, how would I do to load only the desired subset of the observations?


Solution

  • In OGR SQL dialect it's SUBSTR() , though I'd assume for that specific case something like ID LIKE '_01%' would be a more common choice.
    Using {sf} example Shapefile:

    library(sf)
    #> Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
    nc_shp <- system.file("shape/nc.shp", package="sf")
    
    # SUBSTR(FIPS,2,3) = '700'
    read_sf(nc_shp, query = "SELECT NAME, FIPS FROM \"nc\" WHERE SUBSTR(FIPS,2,3) = '700'")
    #> Simple feature collection with 5 features and 2 fields
    #> Geometry type: POLYGON
    #> Dimension:     XY
    #> Bounding box:  xmin: -81.74107 ymin: 34.80792 xmax: -79.23799 ymax: 36.58965
    #> Geodetic CRS:  NAD27
    #> # A tibble: 5 × 3
    #>   NAME      FIPS                                                        geometry
    #>   <chr>     <chr>                                                  <POLYGON [°]>
    #> 1 Ashe      37009 ((-81.47276 36.23436, -81.54084 36.27251, -81.56198 36.27359,…
    #> 2 Alleghany 37005 ((-81.23989 36.36536, -81.24069 36.37942, -81.26284 36.40504,…
    #> 3 Alamance  37001 ((-79.24619 35.86815, -79.23799 35.83725, -79.54099 35.83699,…
    #> 4 Alexander 37003 ((-81.10889 35.7719, -81.12728 35.78897, -81.1414 35.82332, -…
    #> 5 Anson     37007 ((-79.91995 34.80792, -80.32528 34.81476, -80.27512 35.19311,…
    
    # FIPS LIKE '_700%' 
    # `%`: any number of characters; `_`: any one character
    read_sf(nc_shp, query = "SELECT NAME, FIPS FROM \"nc\" WHERE FIPS LIKE '_700%'")
    #> Simple feature collection with 5 features and 2 fields
    #> Geometry type: POLYGON
    #> Dimension:     XY
    #> Bounding box:  xmin: -81.74107 ymin: 34.80792 xmax: -79.23799 ymax: 36.58965
    #> Geodetic CRS:  NAD27
    #> # A tibble: 5 × 3
    #>   NAME      FIPS                                                        geometry
    #>   <chr>     <chr>                                                  <POLYGON [°]>
    #> 1 Ashe      37009 ((-81.47276 36.23436, -81.54084 36.27251, -81.56198 36.27359,…
    #> 2 Alleghany 37005 ((-81.23989 36.36536, -81.24069 36.37942, -81.26284 36.40504,…
    #> 3 Alamance  37001 ((-79.24619 35.86815, -79.23799 35.83725, -79.54099 35.83699,…
    #> 4 Alexander 37003 ((-81.10889 35.7719, -81.12728 35.78897, -81.1414 35.82332, -…
    #> 5 Anson     37007 ((-79.91995 34.80792, -80.32528 34.81476, -80.27512 35.19311,…
    

    Created on 2023-11-30 with reprex v2.0.2