I have a dataframe with lat/lon coordinates which are basically gps signals. I need to calculate the distance between sequential rows to then use in a check to ensure it does't exceed a specific threshold I'm interested in.
Here is an example dataset:
library(geosphere)
library(tidyverse)
Seqlat <- seq(from = -90, to = 90, by = .01)
Seqlong <- seq(from = -180, to = 180, by = .01)
Latitude <- sample(Seqlat, size = 100, replace = TRUE)
Longitude <- sample(Seqlong, size = 100, replace = TRUE)
df <- data.frame(Latitude, Longitude)
I know I can use the geosphere::distm()
function to calculate the distance between the set of coordinates. This works if I extract them individually from the dataframe:
distm(c(df$Longitude[1], df$Latitude[1]),
c(df$Longitude[2], df$Latitude[2]),
fun = distHaversine)
However, when I try to do this in the dataframe it doesn't work. I tried to exclude the last row from the calculation hoping that I would get a difference for all the other rows but this didn't work...
df %>% mutate(distance = ifelse(row_number() == n(), distm(
c(Longitude, Latitude),
c(lead(Longitude), lead(Latitude)),fun = distHaversine
), NA))
Ideally, what I would like is a distance between each consecutive pair of coordinates in a new column. The last row would not have a distance as there isn't a subsequent row from which to calculate it.
df["distance"] <- c(NA,
sapply(seq.int(2,nrow(df)), function(i){
distm(c(df$Longitude[i-1],df$Latitude[i-1]),
c(df$Longitude[i], df$Latitude[i]),
fun = distHaversine)
})
)
This generates a vector beginning with NA
for the first row. then it iterates until the last row while calculating the distance and adds those to the vector.