Search code examples
rgeolocationgeospatialdistancer-sf

How to solve this problem with the st_distance function from sf giving an error when inspecting the data?


I have this data:

df <- data.frame (pc_home = c(1042, 2052, NA, 4021, 9423, NA, 1502, 5942),
                  pc_work = c(NA, 2105, NA, 4352, 8984, NA, 1495, 6050),
                  centroid_home = c(c(122239.347627534, 487236.185950724), c(121552.622967901, 487511.344167049), c(NA, NA), c(120168.155075649, 489952.753092173), c(119154.137476474, 489381.429089547), c(NA,NA), c(120723.216386427, 487950.166456445), c(120570.498333358, 487104.749088018))
                  centroid_work = c(c(NA, NA), c(121337.696586159, 486235.561338213), c(NA, NA), c(123060.850070339, 486752.640463608), c(124354.37048732, 487473.329840357), c(NA,NA), c(123171.113425247, 488458.596501631), c(123952.971290978, 489249.568149519))
                  )

The centroids were calculated using st_centroid() on a shapefile. The c(NA,NA) were the result of missing postal codes used to calculate centroids.

And I use this code:

library(sf)
df <- df %>%
  mutate(dist_hw = st_distance(centroid_home, centroid_work))

No errors, but inspecting the data, I get weird results. In the dataframe view I see no results, and when I try to sort (to see if there are any results), I get this error:

Error in `stop_subscript()`:
! Can't subset elements that don't exist.
x Locations 4324, 7679, 11034, 13428, 16783, etc. don't exist.
i There are only 3355 elements.

enter image description here

I wonder if the error is caused by the NAs or something else?

If it is caused by the NAs, how do I solve that?

All I want is to calculate distance between the points.


Solution

  • Hard to do with the sample data provided. It needs to be in an sf style data frame, and needs a crs as well. Assuming you have those, but had difficulty posting them, the solution below should work. Your sf object needs to have two geometry columns, and it looks like yours should.

    Using st_distance() should work, with the by_element = T argument. Examples below to either use st_distance() directly, or in dplyr::mutate to add a column for distance to the sf data frame.

    library(sf)
    library(tidyverse)
    
    #### Making reproducible data
    # get the nc data, make the geometry column a point with st_centroid
    nc = st_read(system.file("shape/nc.shp", package="sf")) %>%
      select(NAME) %>% st_centroid()
    
    # jitter the centroid point and add (cbind) as a second geometry column
    geo2 <- st_geometry(nc) %>% st_jitter()
    nc <- cbind(nc, geo2)
    ####
    
    # Find the distance between the points, row-by-row
    st_distance(nc$geometry,nc$geometry.1, by_element = T) %>% head()
    #> Units: [m]
    #> [1]  965.8162 2030.5782 1833.3081 1909.5538 1408.7908  820.0569
    
    # or use mutate to add a column to the sf df.
    nc %>% mutate(dist = st_distance(geometry, geometry.1, by_element = T))
    #> Simple feature collection with 100 features and 2 fields
    #> Active geometry column: geometry
    #> Geometry type: POINT
    #> Dimension:     XY
    #> Bounding box:  xmin: -84.05986 ymin: 34.07671 xmax: -75.8095 ymax: 36.49111
    #> Geodetic CRS:  NAD27
    #> First 10 features:
    #>           NAME                   geometry                 geometry.1
    #> 1         Ashe  POINT (-81.49823 36.4314) POINT (-81.49685 36.44001)
    #> 2    Alleghany POINT (-81.12513 36.49111) POINT (-81.13681 36.47545)
    #> 3        Surry POINT (-80.68573 36.41252) POINT (-80.69163 36.39673)
    #> 4    Currituck POINT (-76.02719 36.40714) POINT (-76.02305 36.39029)
    #> 5  Northampton POINT (-77.41046 36.42236) POINT (-77.40909 36.40974)
    #> 6     Hertford POINT (-76.99472 36.36142) POINT (-76.98777 36.36623)
    #> 7       Camden POINT (-76.23402 36.40122)  POINT (-76.23969 36.4181)
    #> 8        Gates POINT (-76.70446 36.44428) POINT (-76.70953 36.45603)
    #> 9       Warren POINT (-78.11042 36.39693) POINT (-78.11619 36.38541)
    #> 10      Stokes POINT (-80.23429 36.40042) POINT (-80.24365 36.39904)
    #>             dist
    #> 1   965.8162 [m]
    #> 2  2030.5782 [m]
    #> 3  1833.3081 [m]
    #> 4  1909.5538 [m]
    #> 5  1408.7908 [m]
    #> 6   820.0569 [m]
    #> 7  1944.8192 [m]
    #> 8  1382.9058 [m]
    #> 9  1381.7946 [m]
    #> 10  851.1106 [m]
    

    Created on 2022-04-13 by the reprex package (v2.0.1)