Search code examples
rspatialr-sfr-sp

R: st_within is taking too long for computation of spatial object


I have two shp files.

I want to know only those shpfile_1 falling completely inside (100%) of shpfile_2.

The yellow polygon : 100% inside

library(sf)
shp_1   <- st_read(system.file("shape/nc.shp", package="sf"))
shp_1   <- st_transform(shp_1,2154);shp_1
shp_1   <-  st_union(shp_1)

nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc <- st_transform(nc,2154)
shp_2   <- nc[1,]
shp_2   <-st_make_grid(st_as_sfc(st_bbox(shp_2)),cellsize=5000)

within <- st_within(shp_2 , shp_1)


The problem is st_within is taking too much time for 6 Million of polygon.

Is there any way to make the process fast?


Solution

  • I'm not sure it will help, but it might worth investigating changing your polygons (shp_1) to lines. If you have lots of small polygones in very big polygons, it might worth it.

    So your approach:

    library(sf)
    shp_1   <- st_read(system.file("shape/nc.shp", package="sf"))
    shp_1   <- st_transform(shp_1,2154);shp_1
    shp_1   <-  st_union(shp_1)
    
    nc <- st_read(system.file("shape/nc.shp", package="sf"))
    nc <- st_transform(nc,2154)
    shp_2   <- nc[1,]
    shp_2   <-st_make_grid(st_as_sfc(st_bbox(shp_2)),cellsize=1000)
    within <- st_within(shp_2 , shp_1)
    within_TF <- sapply(within, function(xx) length(xx)!=0)
    

    The idea is to transform the shp_a polygon into a line shape, than use st_intersects to find all smaller shp_2 polygons that touches the line and remove them from the original shape. Now, if you shp_2 is larger than shp_1 you have first to extract all smaller polygons that do not intersects. This is done with st_intersects and might actually defeat the purpose as it may be slow... Anyway, in this example it's some time better.

    inter <- st_intersects(shp_2, shp_1)   ## find the intersects
    inter_TF <- sapply(inter, function(xx) length(xx)!=0)  ## prepare the filter
    shp_2_touches <- shp_2[inter_TF,]   ## apply the filter
    
    shp_1_line <- st_cast(shp_1, "MULTILINESTRING")  ## convert to lines
    inter_line <- st_intersects(shp_2_touches, shp_1_line)  ## find the polygons that touches the line
    inter_line_TF <- sapply(inter_line, function(xx) length(xx)!=0)  ## prepare the filter
    

    With that done, you can see that both extraction are the same:

    identical(shp_2[within_TF,], shp_2_touches[!inter_line_TF,])
    [1] TRUE
    

    For speed, it's not a super gain, but on my horrible computer, it was somehow faster...

    microbenchmark::microbenchmark(within = st_within(shp_2 , shp_1),
                                   multiline = {
                                     inter <- st_intersects(shp_2, shp_1)
                                     inter_TF <- sapply(inter, function(xx) length(xx)!=0)
                                     shp_2_touches <- shp_2[inter_TF,]
                                     
                                     shp_1_line <- st_cast(shp_1, "MULTILINESTRING")
                                     inter_line <- st_intersects(shp_2_touches, shp_1_line)
                                   })
    Unit: milliseconds
          expr      min       lq     mean   median       uq      max neval
        within 405.6318 466.0014 760.2502 617.4705 817.4921 2177.999   100
     multiline 197.9589 258.3050 523.7598 399.6880 659.9645 2239.103   100
    

    There is not much way around st_intersects. It generally perform pretty well, but if it's slow in your case, there is not that many way around it. In most cases, you will want to reduce the size of your problem by doing multiple smaller intersection and data reduction.