Search code examples
rdplyrspatialgeosphere

Distance between coordinates in dataframe sequentially?


I have a dataframe with lat/lon coordinates which are basically gps signals. I need to calculate the distance between sequential rows to then use in a check to ensure it does't exceed a specific threshold I'm interested in.

Here is an example dataset:

library(geosphere)
library(tidyverse)

Seqlat <- seq(from = -90, to = 90, by = .01)
Seqlong <- seq(from = -180, to = 180, by = .01)
Latitude <- sample(Seqlat, size = 100, replace = TRUE)
Longitude <- sample(Seqlong, size = 100, replace = TRUE)

df <- data.frame(Latitude, Longitude)

I know I can use the geosphere::distm() function to calculate the distance between the set of coordinates. This works if I extract them individually from the dataframe:


distm(c(df$Longitude[1], df$Latitude[1]),
  c(df$Longitude[2], df$Latitude[2]),
  fun = distHaversine)

However, when I try to do this in the dataframe it doesn't work. I tried to exclude the last row from the calculation hoping that I would get a difference for all the other rows but this didn't work...

df %>% mutate(distance = ifelse(row_number() == n(), distm(
  c(Longitude, Latitude),
  c(lead(Longitude), lead(Latitude)),fun = distHaversine
), NA))

Ideally, what I would like is a distance between each consecutive pair of coordinates in a new column. The last row would not have a distance as there isn't a subsequent row from which to calculate it.


Solution

  • df["distance"] <- c(NA,
                        sapply(seq.int(2,nrow(df)), function(i){
                          distm(c(df$Longitude[i-1],df$Latitude[i-1]),
                                c(df$Longitude[i], df$Latitude[i]),
                                fun = distHaversine)
                        })
    )
    

    This generates a vector beginning with NA for the first row. then it iterates until the last row while calculating the distance and adds those to the vector.