Why use st_intersection rather than st_intersects?

st_intersection is very slow compared to st_intersects. So why not use the latter instead of the former? Here's an example with a small toy dataset, but the difference in execution time is huge for my actual set of just 62,020 points intersected with an actual geographic region boundary. I have 24Gb of RAM and the st_intersects code takes a few seconds whereas the st_intersection code takes more than 15 minutes (possibly much more, I haven't had the patience to wait...). Does st_intersection do anything that I am not getting with st_intersects?

The below code handles sfc objects but I believe would work equally for sf objects.

library(sf)
library(dplyr)

# create square
s <- rbind(c(1, 1), c(10, 1), c(10, 10), c(1, 10), c(1, 1)) %>% list %>% st_polygon %>% st_sfc
# create random points
p <- runif(50, 0, 11) %>% cbind(runif(50, 0, 11)) %>% st_multipoint %>% st_sfc %>% st_cast("POINT")

# intersect points and square with st_intersection
st_intersection(p, s)

# intersect points and square with st_intersects (courtesy of https://stackoverflow.com/a/49304723/7114709)
p[st_intersects(p, s) %>% lengths > 0,]

Solution

The answer is that in general the two methods do different things, though in your particular case (finding the intersection of a collection of points and a polygon), st_intersects can be used to efficiently do the same job.

We can show the difference with a simple example modified from your own. We start with a square:

library(sf)
library(dplyr)

# create square
s <- rbind(c(1, 1), c(10, 1), c(10, 10), c(1, 10), c(1, 1)) %>% 
  list %>% 
  st_polygon %>% 
  st_sfc

plot(s)

Now we will create a rectangle and draw it on the same plot with a dotted outline:

# create rectangle
r <- rbind(c(-1, 2), c(11, 2), c(11, 4), c(-1, 4), c(-1, 2)) %>% 
  list %>% 
  st_polygon %>% 
  st_sfc

plot(r, add= TRUE, lty = 2)

Now we find the intersection of the two polygons and plot it in red:

# intersect points and square with st_intersection
i <- st_intersection(s, r)

plot(i, add = TRUE, lty = 2, col = "red")

When we examine the object i, we will see it is a new polygon:

i
#> Geometry set for 1 feature 
#> geometry type:  POLYGON
#> dimension:      XY
#> bbox:           xmin: 1 ymin: 2 xmax: 10 ymax: 4
#> epsg (SRID):    NA
#> proj4string:    NA
#> POLYGON ((10 4, 10 2, 1 2, 1 4, 10 4))

Whereas, if we use st_intersects, we only get a logical result telling us whether there is indeed an intersection between r and s. If we try to use this to subset r to find the intersection, we don't get the intersected shape, we just get our original rectangle back:

r[which(unlist(st_intersects(s, r)) == 1)]
#> Geometry set for 1 feature 
#> geometry type:  POLYGON
#> dimension:      XY
#> bbox:           xmin: -1 ymin: 2 xmax: 11 ymax: 4
#> epsg (SRID):    NA
#> proj4string:    NA
#> POLYGON ((-1 2, 11 2, 11 4, -1 4, -1 2))

The situation that you have is different, because you are trying to find a subset of points that intersect a polygon. Is this case, the intersection of a group of points with a polygon is the same as the subset that meet the criterion st_intersects.

So it is great that you have found a valid way of getting a quicker intersection. Just be aware this will only work with collections of points intersecting a polygon.