Search code examples
rnearest-neighborterra

terra nearest() returns NA only for `to_id` column when centroids = FALSE


I have a number of geographical points in the sea and a map with Swedish sea areas. I want to know in which sea area each point lies and for the points outside sea areas the distance to nearest sea area including ID for the nearest sea area.

library(terra)
#> terra 1.7.71

map <- vect("/vsizip//vsicurl/https://www.smhi.se/polopoly_fs/1.140307!/Havsomr_SVAR_2016_3b.zip")

pts <- data.frame(id = 11:25,
                  lon = c(15.29416656, 15.10333347, 15.27083302, 15.27250004, 15.1291666, 
                          15.31333351, 15.24166679, 15.30716705, 15.32333374, 15.38833332, 
                          15.41416645, 15.40666676, 15.27166653, 15.26083374, 15.1916666),
                  lat = c(56.1566658, 56.15000153, 56.15250015, 56.13999939, 56.16383362, 
                          56.15333176, 56.1558342, 56.17416763, 56.12366486, 56.14250183, 
                          56.15916824, 56.17666626, 56.17666626, 56.17300034, 56.1570015)) |> 
  vect(geom = c("lon", "lat"), crs = "EPSG:4326") |> 
  project("EPSG:3006")

As you can see when you run this simple code that to_id is always NA and of type logical. I want to have the column "HID" from map in column to_id (or other ID from map). Can I get that in some way?

npol <- nearest(pts, map[, "HID"], centroids = FALSE)
npol
#>  class       : SpatVector 
#>  geometry    : points 
#>  dimensions  : 15, 5  (geometries, attributes)
#>  extent      : 506419.7, 525716.6, 6219890, 6225817  (xmin, xmax, ymin, ymax)
#>  coord. ref. : SWEREF99 TM (EPSG:3006) 
#>  names       : from_id    from_x    from_y     to_id distance
#>  type        :   <int>     <num>     <num> <logical>    <num>
#>  values      :       1 5.183e+05 6.224e+06      <NA>        0
#>                      2 5.064e+05 6.223e+06      <NA>        0
#>                      3 5.168e+05 6.223e+06      <NA>        0

Solution

  • Unfortunately, the documentation at ?terra::nearest does not provide any examples for nearest(), especially with centroids = FALSE. Personally, I would expect the distance to be zero if the point is within a polygon and otherwise to correspond to the distance to the nearest node of a polygon - but I can't tell for sure how this is meant to be. And obviously this does not seem to meet my expectations.

    However, for this data we can simply compute the complete distance matrix using terra::distance() and just pick the information we are interested in: the index of the relevant polygon (which you can use for indexing your "HID" column from map) and the distance itself:

    distm <- distance(pts, map[, "HID"])
    
    pts[, "to_id"] <- apply(distm, MARGIN = 1, FUN = which.min)
    pts[, "to_HID"] <- map[pts$to_id, ]$HID
    pts[, "distance"] <- apply(distm, MARGIN = 1, FUN = min)
    
    pts
    #>  class       : SpatVector 
    #>  geometry    : points 
    #>  dimensions  : 15, 4  (geometries, attributes)
    #>  extent      : 506419.7, 525724.3, 6219890, 6225817  (xmin, xmax, ymin, ymax)
    #>  coord. ref. : SWEREF99 TM (EPSG:3006) 
    #>  names       :    id to_id        to_HID distance
    #>  type        : <int> <int>         <chr>    <num>
    #>  values      :    11   279 560940-151740        0
    #>                   12   288 560850-150580        0
    #>                   13   279 560940-151740        0
    

    According to this output, 3 of your 15 points are on-shore.

    Edit:

    Just to reproduce this out of curiousity using {sf} instead:

    library(sf)
    #> Linking to GEOS 3.11.2, GDAL 3.8.2, PROJ 9.3.1; sf_use_s2() is TRUE
    
    map_sf <- st_as_sf(map)
    
    pts_sf <- st_as_sf(pts)
    
    st_nearest_feature(pts_sf, map_sf)
    #>  [1] 279 288 279 371 289 279 327 279 371 244 280 280 279 279 327
    
    st_distance(pts_sf, map_sf) |> apply(MARGIN = 1, FUN = min)
    #>  [1]  0.000000  0.000000  0.000000  0.000000 58.765524  0.000000  0.000000
    #>  [8] 27.636855  0.000000  0.000000  7.755143  0.000000  0.000000  0.000000
    #> [15]  0.000000
    

    And a last take using {nngeo}:

    library(nngeo)
    
    pts_nn <- st_nn(pts_sf, map_sf, returnDist = TRUE)
    lapply(pts_nn, unlist)
    #> $nn
    #>  [1] 279 288 279 371 289 279 327 279 371 244 280 280 279 279 327
    #> 
    #> $dist
    #>  [1]  0.000000  0.000000  0.000000  0.000000 58.765524  0.000000  0.000000
    #>  [8] 27.636855  0.000000  0.000000  7.755143  0.000000  0.000000  0.000000
    #> [15]  0.000000