Search code examples
rr-sf

Error when subsetting sfc objects with explicit binary predicate


I don't understand why the last two examples in the following code chunk fail

library(sf)
#> Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE

p1 <- st_sfc(st_point(c(0, 0)))
p2 <- st_sfc(st_point(c(1, 1)))

p1[p2]
#> Geometry set for 0 features 
#> Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
#> CRS:           NA
p1[p2, op = st_intersects]
#> Error in `[.default`(p1, p2, op = st_intersects): incorrect number of dimensions
p1[p2, , op = st_intersects]
#> Error in `[.default`(p1, p2, , op = st_intersects): incorrect number of dimensions

when the default operation specified by [.sfc is exactly st_intersects.

sf:::`[.sfc`
#> function (x, i, j, ..., op = st_intersects) 
#> {
#>     if (!missing(i) && (inherits(i, "sf") || inherits(i, "sfc") || 
#>         inherits(i, "sfg"))) 
#>         i = lengths(op(x, i, ...)) != 0
#>     st_sfc(NextMethod(), crs = st_crs(x), precision = st_precision(x), 
#>         dim = if (length(x)) 
#>             class(x[[1]])[1]
#>         else "XY")
#> }
#> <bytecode: 0x00000255ad682df8>
#> <environment: namespace:sf>

Created on 2024-01-21 with reprex v2.0.2

Could you please help me? How should I specify the op parameter in the correct way when subsetting sfc object?

EDIT (2024-01-23)

Just for future reference, I want to point out that the behaviour described in this question has been adjusted/fixed/modified in the sf package. See https://github.com/r-spatial/sf/issues/2320 for more details.


Solution

  • The problem here occurs because `[.sfc` uses NextMethod() internally. The way this works is that the arguments you explicitly passed to `[.sfc` are passed on to the default subsetting method for lists. This is the primitive "[" function, implemented in the C code function do_subset_dflt.

    Unusually for an R function, this ignores the names of any arguments (with the single exception of drop), and goes by order only:

    my_list <- list(a = 1, b = 2)
    
    my_list[the_name_of_this_argument_is_ignored = 2]
    #> $b
    #> [1] 2
    
    my_list[the_name_of_this_argument_is_ignored = 2, drop = FALSE]
    #> $b
    #> [1] 2
    

    This even leads to the confusing result

    my_mat <- matrix(1:4, 2, 2)
    
    my_mat[i = 1, j = 2]
    #> [1] 3
    
    my_mat[j = 1, i = 2]
    #> [1] 3
    

    If the number of (non-drop) arguments exceeds the number of dimensions of our object, we get exactly your error:

    my_list[2, op = sf::st_intersects]
    #> Error in my_list[2, op = sf::st_intersects] : 
    #>   incorrect number of dimensions
    

    It seems confusing that a default argument works without a problem, yet making exactly the same function call but explicitly naming the default argument throws an error. This is a result of how NextMethod works - it only passes on the symbols you specifically passed to the initial call, ignoring any default parameters.

    The whole situation can be shown quite easily by creating a new class with its own subsetting method:

    test <- function(x) structure(matrix(x, 2, 2), class = "test")
    
    `[.test` <- function(x, i, j, ..., nag = FALSE) {
      if(nag) cat("This won't work now\n")
      NextMethod()
    }
    
    x <- test(5)
    
    x[1, 1]
    #> [1] 5
    
    x[1, 1, nag = TRUE]
    #> This won't work now
    #> Error in `[.default`(x, 1, 1, nag = TRUE): incorrect number of dimensions
    
    x[1, 1, nag = FALSE]
    #> Error in `[.default`(x, 1, 1, nag = FALSE): incorrect number of dimensions
    

    There isn't any direct work-around for this because [ is a primitive function. The use of NextMethod means that the named argument op is not, in fact, available for use by the end-user. Since the function is not exported and is undocumented, this is not really a bug, but it seems like a strange design decision to have op passed as an argument rather than hard-coding `st_intersects in the body of the function.

    In the meantime, the best thing is to avoid using the op argument, and just using the predicate function directly:

    library(sf)
    #> Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
    
    p1 <- st_sfc(st_point(c(0, 0)))
    p2 <- st_sfc(st_point(c(1, 1)))
    
    p1[lengths(st_intersects(p1, p2)) != 0]
    #> Geometry set for 0 features 
    #> Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
    #> CRS:           NA
    

    Created on 2024-01-21 with reprex v2.0.2