Search code examples
rgeospatialr-sftopology

Why use st_intersection rather than st_intersects?


st_intersection is very slow compared to st_intersects. So why not use the latter instead of the former? Here's an example with a small toy dataset, but the difference in execution time is huge for my actual set of just 62,020 points intersected with an actual geographic region boundary. I have 24Gb of RAM and the st_intersects code takes a few seconds whereas the st_intersection code takes more than 15 minutes (possibly much more, I haven't had the patience to wait...). Does st_intersection do anything that I am not getting with st_intersects?

The below code handles sfc objects but I believe would work equally for sf objects.

library(sf)
library(dplyr)

# create square
s <- rbind(c(1, 1), c(10, 1), c(10, 10), c(1, 10), c(1, 1)) %>% list %>% st_polygon %>% st_sfc
# create random points
p <- runif(50, 0, 11) %>% cbind(runif(50, 0, 11)) %>% st_multipoint %>% st_sfc %>% st_cast("POINT")

# intersect points and square with st_intersection
st_intersection(p, s)

# intersect points and square with st_intersects (courtesy of https://stackoverflow.com/a/49304723/7114709)
p[st_intersects(p, s) %>% lengths > 0,]

Solution

  • The answer is that in general the two methods do different things, though in your particular case (finding the intersection of a collection of points and a polygon), st_intersects can be used to efficiently do the same job.

    We can show the difference with a simple example modified from your own. We start with a square:

    library(sf)
    library(dplyr)
    
    # create square
    s <- rbind(c(1, 1), c(10, 1), c(10, 10), c(1, 10), c(1, 1)) %>% 
      list %>% 
      st_polygon %>% 
      st_sfc
    
    plot(s)
    

    enter image description here

    Now we will create a rectangle and draw it on the same plot with a dotted outline:

    # create rectangle
    r <- rbind(c(-1, 2), c(11, 2), c(11, 4), c(-1, 4), c(-1, 2)) %>% 
      list %>% 
      st_polygon %>% 
      st_sfc
    
    plot(r, add= TRUE, lty = 2)
    

    enter image description here

    Now we find the intersection of the two polygons and plot it in red:

    # intersect points and square with st_intersection
    i <- st_intersection(s, r)
    
    plot(i, add = TRUE, lty = 2, col = "red")
    

    enter image description here

    When we examine the object i, we will see it is a new polygon:

    i
    #> Geometry set for 1 feature 
    #> geometry type:  POLYGON
    #> dimension:      XY
    #> bbox:           xmin: 1 ymin: 2 xmax: 10 ymax: 4
    #> epsg (SRID):    NA
    #> proj4string:    NA
    #> POLYGON ((10 4, 10 2, 1 2, 1 4, 10 4))
    

    Whereas, if we use st_intersects, we only get a logical result telling us whether there is indeed an intersection between r and s. If we try to use this to subset r to find the intersection, we don't get the intersected shape, we just get our original rectangle back:

    r[which(unlist(st_intersects(s, r)) == 1)]
    #> Geometry set for 1 feature 
    #> geometry type:  POLYGON
    #> dimension:      XY
    #> bbox:           xmin: -1 ymin: 2 xmax: 11 ymax: 4
    #> epsg (SRID):    NA
    #> proj4string:    NA
    #> POLYGON ((-1 2, 11 2, 11 4, -1 4, -1 2))
    

    The situation that you have is different, because you are trying to find a subset of points that intersect a polygon. Is this case, the intersection of a group of points with a polygon is the same as the subset that meet the criterion st_intersects.

    So it is great that you have found a valid way of getting a quicker intersection. Just be aware this will only work with collections of points intersecting a polygon.