Search code examples
roptimizationr-sf

Quickly filter down a grid of sf points according to a polygon


I want to make grids (in the sense of data frames of x- and y-coordinates) over the US, or regions of the US, throwing out any points in the original grid that are beyond the borders of the US. I have some code that seems to work, but it's quite slow when I shrink the grid increment down to 1 km (1e3) or so. How can this be done faster? Perhaps there's a way to build the simple feature collection that I need without a lapply or loop, or perhaps this can be done with a MULTIPOINT instead of a simple feature collection of POINTs.

library(sf)

crs.us.atlas = 2163 # https://epsg.io/2163

us = read_sf(paste0("/vsizip/",
    "/tmp/us.zip")) # from: https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_state_500k.zip
# Filter down to the lower 48 + DC.
us = us[us$STUSPS %in% setdiff(c(state.abb, "DC"), c("AK", "HI")),]
us = st_transform(us, crs = crs.us.atlas)
l = as.list(st_bbox(us))

increment = 1e5

g = expand.grid(
    x = seq(l$xmin, l$xmax, by = increment),
    y = seq(l$ymin, l$ymax, by = increment))

message("Running the slow part")
print(system.time(g <- g[0 < sapply(FUN = length, st_intersects(
    st_as_sfc(crs = crs.us.atlas, lapply(1 : nrow(g), function(i)
        st_point(as.numeric(g[i, c("x", "y")])))),
    us)),]))

print(nrow(g))

Solution

  • sf has a function st_make_grid that could help with this (in older versions of sf, it even automatically cropped to the polygon), but curiously, it's quite slow as of this writing.

    I can get reasonable performance without additional dependencies by using st_as_sf to convert g to a simple feature collection of points:

    g = st_as_sf(crs = crs.us.atlas, coords = c(1, 2), g)
    

    Then, following @agila's use of unlist, the slow part becomes

    print(system.time(g <- g[unlist(st_intersects(us, g)),]))
    

    which, with increment = 1e3, takes 3 minutes on a high-end server.