Search code examples
rr-sf

Why does st_join(ob1, ob2, left = TRUE) return an sf object with more features than ob1?


I have two sf objects with multiple polygon features. The two objects have attributes that I would like to join to each other using the polygons. I have managed to do so with st_join(), but not in the way that I expected.

When using st_join(), I initially expected that if I specified the argument st_join(x, y, left = TRUE), that the resulting object would have the same number of features as x if all the geometries in x are contained in y. For example:

library("sf")
library("tidyverse")

map1 <- st_as_sfc(c("POLYGON((0 0 , 0 1 , -1 1 , -1 0, 0 0))",
                    "POLYGON((0 0 , 0 2 , 2 2 , 2 0, 0 0 ))", 
                    "POLYGON((0 0 , 0 -0.5 , -0.5 -0.5 , -0.5 0, 0 0))")) %>% 
  st_sf(ID = paste0("poly", 1:3),
        att1 = c(letters[1:3]))

map2 <- st_as_sfc(c("POLYGON((0 0 , 0 0.5 , -0.5 .5 , -0.5 0, 0 0))",
                    "POLYGON((-0.5 0 , -0.5 1 , -1 1 , -1 0 , -0.5 0))",
                    "POLYGON((0 0 , 0 1 , 1 1 , 1 0 , 0 0))",
                    "POLYGON((1 1 , 1 2 , 2 2 , 2 1 , 1 1))",
                    "POLYGON((0 0 , 0 -0.25 , -0.25 -0.25 , -0.25 0 , 0 0))")) %>% 
  st_sf(ID = paste0("poly", 4:8)) %>%
  mutate(att2 = c(LETTERS[1:5]))

map3 <- st_join(map2,
                map1,
                left = TRUE)

My expectation would be that map3 would have 5 features because all the polygons in map2 are contained uniquely in features of map1. This is however not the case, and polygons that do not overlap appear to have been joined with each other:

Simple feature collection with 12 features and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -1 ymin: -0.25 xmax: 2 ymax: 2
CRS:           NA
First 10 features:
     ID.x att2  ID.y att1                              .
1   poly4    A poly1    a POLYGON ((0 0, 0 0.5, -0.5 ...
1.1 poly4    A poly2    b POLYGON ((0 0, 0 0.5, -0.5 ...
1.2 poly4    A poly3    c POLYGON ((0 0, 0 0.5, -0.5 ...
2   poly5    B poly1    a POLYGON ((-0.5 0, -0.5 1, -...
2.1 poly5    B poly3    c POLYGON ((-0.5 0, -0.5 1, -...
3   poly6    C poly1    a POLYGON ((0 0, 0 1, 1 1, 1 ...
3.1 poly6    C poly2    b POLYGON ((0 0, 0 1, 1 1, 1 ...
3.2 poly6    C poly3    c POLYGON ((0 0, 0 1, 1 1, 1 ...
4   poly7    D poly2    b POLYGON ((1 1, 1 2, 2 2, 2 ...
5   poly8    E poly1    a POLYGON ((0 0, 0 -0.25, -0....

I have managed to get the expected result by using the argument largest = TRUE as follows:

map4 <- st_join(map2,
                map1,
                left = TRUE,
                largest = TRUE)

I however don't understand why this gives me the desired output but omitting largest = TRUE does not.

Thanks for taking the time to help!


Solution

  • Default predicate for st_join() is st_intersects, which is also TRUE for touching geometries.

    intersects - A and B are not disjoint
    disjoint - A and B have no points in common
    ( https://r-spatial.org/book/03-Geometries.html#sec-de9im )

    In your case, you seem to be after st_within.

    within - None of the points of B are outside A

    Lets first visualize how those features relate:

    library("sf")
    #> Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
    library("tidyverse")
    
    map1 <- st_as_sfc(c("POLYGON((0 0 , 0 1 , -1 1 , -1 0, 0 0))",
                        "POLYGON((0 0 , 0 2 , 2 2 , 2 0, 0 0 ))", 
                        "POLYGON((0 0 , 0 -0.5 , -0.5 -0.5 , -0.5 0, 0 0))")) %>% 
      st_sf(ID = paste0("poly", 1:3),
            att1 = c(letters[1:3]))
    
    map2 <- st_as_sfc(c("POLYGON((0 0 , 0 0.5 , -0.5 .5 , -0.5 0, 0 0))",
                        "POLYGON((-0.5 0 , -0.5 1 , -1 1 , -1 0 , -0.5 0))",
                        "POLYGON((0 0 , 0 1 , 1 1 , 1 0 , 0 0))",
                        "POLYGON((1 1 , 1 2 , 2 2 , 2 1 , 1 1))",
                        "POLYGON((0 0 , 0 -0.25 , -0.25 -0.25 , -0.25 0 , 0 0))")) %>% 
      st_sf(ID = paste0("poly", 4:8)) %>%
      mutate(att2 = c(LETTERS[1:5]))
    
    ggplot() +
      geom_sf(data = map1, aes(fill = "map1"), linewidth = 5,) +
      geom_sf(data = map2, aes(fill = "map2"), color = "white", , linewidth = 1, alpha = .2) +
      geom_sf_label(data = map1, aes(label = ID, fill = "map1")) +
      geom_sf_label(data = map2, aes(label = ID, fill = "map2"), color = "white") +
      theme_minimal()
    

    With st_join(... , join = st_intersects) (default):

    
    st_join(map2, map1, left = TRUE, join = st_intersects)
    #> Simple feature collection with 12 features and 4 fields
    #> Geometry type: POLYGON
    #> Dimension:     XY
    #> Bounding box:  xmin: -1 ymin: -0.25 xmax: 2 ymax: 2
    #> CRS:           NA
    #> First 10 features:
    #>      ID.x att2  ID.y att1                              .
    #> 1   poly4    A poly1    a POLYGON ((0 0, 0 0.5, -0.5 ...
    #> 1.1 poly4    A poly2    b POLYGON ((0 0, 0 0.5, -0.5 ...
    #> 1.2 poly4    A poly3    c POLYGON ((0 0, 0 0.5, -0.5 ...
    #> 2   poly5    B poly1    a POLYGON ((-0.5 0, -0.5 1, -...
    #> 2.1 poly5    B poly3    c POLYGON ((-0.5 0, -0.5 1, -...
    #> 3   poly6    C poly1    a POLYGON ((0 0, 0 1, 1 1, 1 ...
    #> 3.1 poly6    C poly2    b POLYGON ((0 0, 0 1, 1 1, 1 ...
    #> 3.2 poly6    C poly3    c POLYGON ((0 0, 0 1, 1 1, 1 ...
    #> 4   poly7    D poly2    b POLYGON ((1 1, 1 2, 2 2, 2 ...
    #> 5   poly8    E poly1    a POLYGON ((0 0, 0 -0.25, -0....
    

    With st_join(... , join = st_within):

    st_join(map2, map1, left = TRUE, join = st_within)
    #> Simple feature collection with 5 features and 4 fields
    #> Geometry type: POLYGON
    #> Dimension:     XY
    #> Bounding box:  xmin: -1 ymin: -0.25 xmax: 2 ymax: 2
    #> CRS:           NA
    #>    ID.x att2  ID.y att1                              .
    #> 1 poly4    A poly1    a POLYGON ((0 0, 0 0.5, -0.5 ...
    #> 2 poly5    B poly1    a POLYGON ((-0.5 0, -0.5 1, -...
    #> 3 poly6    C poly2    b POLYGON ((0 0, 0 1, 1 1, 1 ...
    #> 4 poly7    D poly2    b POLYGON ((1 1, 1 2, 2 2, 2 ...
    #> 5 poly8    E poly3    c POLYGON ((0 0, 0 -0.25, -0....
    

    Created on 2024-04-15 with reprex v2.1.0