Search code examples
rcoordinatesdistance

How can I find the distance between consecutive coordinates in R?


I have a dataframe similar in structure to the one created below:

id <- rep(c("a", "b", "c", "d"), each = 3)
date <- seq(as.Date("2019-01-30"), as.Date("2019-02-10"), by="days")
lon <- c(-87.1234, -86.54980, -86.234059, -87.2568, -87.65468, -86.54980, -86.234059, -86.16486, -87.156546, -86.234059, -86.16486, -87.156546)
lat <- c(26.458, 26.156, 25.468, 25.157, 24.154, 24.689, 25.575, 25.468, 25.157, 24.154, 26.789, 26.456)
data <- data.frame(id, date, lon, lat)
data <- data %>% arrange(id, date)

I would like to measure the distance between consecutive points grouped by id. I do not want a distance matrix, which is why I refrain from using raster::pointDistance. I tried separating each unique id into its own sf dataframe (in reality I have ~400 ids so I kind of have to separate for the actual calculation due to the size) and using the following code:

#put rows for each id in their own dataframes
un1 <- unique(data$id)
for(i in seq_along(un1)) 
  assign(paste0('id', i), subset(data, id == un1[i]))
#create point distance function
pt.dist <- function(dat){dat$pt.dist <- st_distance(dat, by_element=TRUE)
  return(dat)}
#run function across every dataframe in working environment
e <- .GlobalEnv
nms <- ls(pattern = "id", envir = e)
for(nm in nms) e[[nm]] <- pt.dist(e[[nm]])

When I run this, all I get is a geometry column with lon and lat listed in a pair. I have also tried segclust2d::calc_distance like below:

distance <- function(dat){calc_dist(dat, coord.names = c("lon", "lat"), smoothed = FALSE)}
for(nm in nms) e[[nm]] <- distance(e[[nm]])

which returns a column where the distances are all 0 meters.

Any help would be greatly appreciated!


Solution

  • geosphere::dist* support this. The most-accurate is distVincentyEllipsoid (though it may be slower with larger data), followed by distVincentySphere and distHaversine. Its return value is in meters.

    dplyr

    library(dplyr)
    data %>%
      group_by(id) %>%
      mutate(dist = c(NA, geosphere::distVincentyEllipsoid(cbind(lon, lat)))) %>%
      ungroup()
    # # A tibble: 12 x 5
    #    id    date         lon   lat    dist
    #    <chr> <date>     <dbl> <dbl>   <dbl>
    #  1 a     2019-01-30 -87.1  26.5     NA 
    #  2 a     2019-01-31 -86.5  26.2  66334.
    #  3 a     2019-02-01 -86.2  25.5  82534.
    #  4 b     2019-02-02 -87.3  25.2     NA 
    #  5 b     2019-02-03 -87.7  24.2 118175.
    #  6 b     2019-02-04 -86.5  24.7 126758.
    #  7 c     2019-02-05 -86.2  25.6     NA 
    #  8 c     2019-02-06 -86.2  25.5  13744.
    #  9 c     2019-02-07 -87.2  25.2 105632.
    # 10 d     2019-02-08 -86.2  24.2     NA 
    # 11 d     2019-02-09 -86.2  26.8 291988.
    # 12 d     2019-02-10 -87.2  26.5 105423.
    

    base R

    We can get to the same thing with ave. Because it only iterates over a single column, we pass row-indices as the first argument to be grouped. Because it coerces the return values to be the same class as the first argument, we convert the row-indices to numeric.

    data$dist2 <- ave(
      as.numeric(seq_len(nrow(data))),  # values to use in calc
      data$id,                          # grouping variable(s)
      FUN = function(i) c(NA, geosphere::distVincentyEllipsoid(data[i, c("lon", "lat")]))
    )
    data
    #    id       date       lon    lat     dist2
    # 1   a 2019-01-30 -87.12340 26.458        NA
    # 2   a 2019-01-31 -86.54980 26.156  66334.13
    # 3   a 2019-02-01 -86.23406 25.468  82534.47
    # 4   b 2019-02-02 -87.25680 25.157        NA
    # 5   b 2019-02-03 -87.65468 24.154 118175.40
    # 6   b 2019-02-04 -86.54980 24.689 126757.93
    # 7   c 2019-02-05 -86.23406 25.575        NA
    # 8   c 2019-02-06 -86.16486 25.468  13743.74
    # 9   c 2019-02-07 -87.15655 25.157 105631.82
    # 10  d 2019-02-08 -86.23406 24.154        NA
    # 11  d 2019-02-09 -86.16486 26.789 291988.42
    # 12  d 2019-02-10 -87.15655 26.456 105422.87
    

    Internally, the second call to the FUN function passed i=c(4,5,6) for the "b" id group. Those numbers do not need to be consecutive; in fact, one strength of ave over other group-processing functions is that it always returns in the same order as the input, so it is safe to reassign its value back to the original frame.