Search code examples
rgroup-bydplyrdistancegeosphere

Calculate distance between consecutive rows, by group


Morning, afternoon, evening

I have the following boat data:

set.seed(123)

df <- data.frame(
  fac = as.factor(c("A", "A", "A", "A",
                    "B", "B", "B",
                    "C", "C", "C", "C", "C")),
  lat = runif(12, min = 45, max = 47),
  lon = runif(12, min = -6, max = -5 ))

I group the data by the factor variable fac.

library(dplyr)

df_grouped <- df %>% 
  group_by(fac) %>% 
  summarise(first_lon = first(lon),
            last_lon  = last(lon),
            first_lat = first(lat),
            last_lat  = last(lat))

I use the first and last latitudes (lat) and longitudes (lon) to create polygons

I also use the first and last latitudes (lat) and longitudes (lon) to estimate distance across the polygon.

library(geosphere)

df_grouped %>% 
  mutate(distance_m = distHaversine(matrix(c(first_lon, first_lat), ncol = 2),
                                    matrix(c(last_lon, last_lat),   ncol = 2)))

Although this assumes the boat goes in a straight line across the longest possible distance within the polygon.

This is not always true, sometimes it wiggles about a bit:

.

What I would like to do is actual distance the boat has traveled by working out the distance between each row with a group.

Or in other words:

For example for fac == "C", the boat will have traveled x meters, where x is calculated from the distance between each data point within the grouping.


Solution

  • Try :

    df %>%  group_by(fac) %>%
      mutate(lat_prev = lag(lat,1), lon_prev = lag(lon,1) ) %>%
       mutate(dist = distHaversine(matrix(c(lon_prev, lat_prev), ncol = 2),
                    matrix(c(lon, lat),   ncol = 2))) %>%
      summarize(dist = sum(dist,na.rm=T))
    
    # A tibble: 3 x 2
      fac      dist
      <fct>   <dbl>
    1 A      93708.
    2 B     219742.
    3 C     347578.
    
    

    Much better, as suggested by Henrik:

    df %>%  group_by(fac) %>%
            summarize(dist = distHaversine(cbind(lon, lat))) %>%
            summarize(dist = sum(dist,na.rm=T))