Search code examples
rdplyrgeosphere

Substract geographic distances from previous row by group using dplyr and geosphere


I have a dataframe like this.

df <- data.frame(
  id = c(rep("A", 5), rep("B", 5)),
  date = as.Date(as.Date("2022-6-1"):as.Date("2022-6-10"), origin="1970-01-01"),
  lon = 101:110,
  lat = 1:10
)
> df
   id       date    lon   lat
1   A 2022-06-01 101.01  1.01
2   A 2022-06-02 102.01  2.01
3   A 2022-06-03 103.01  3.01
4   A 2022-06-04 104.01  4.01
5   A 2022-06-05 105.01  5.01
6   B 2022-06-06 106.01  6.01
7   B 2022-06-07 107.01  7.01
8   B 2022-06-08 108.01  8.01
9   B 2022-06-09 109.01  9.01
10  B 2022-06-10 110.01 10.01

What I want to do is to calculate the daily traveled distance for each group A and B, and store them in a new column called dist.

I figured out that using dplyr::lag and geosphere::distGeo will help, so I tried the following code.

df %>%
    group_by(id) %>%
    arrange(date, .by_group = TRUE) %>%
    mutate(dist = distGeo(.[, c(lon, lat)],
                          lag(.[, c(lon, lat)], default = first(.[, c(lon, lat)]))))

but this did not work.

Error in `mutate()`:
! Problem while computing `dist = distGeo(...)`.
ℹ The error occurred in group 1: id = "A".
Caused by error in `vectbl_as_col_location()`:
! Must subset columns with a valid subscript vector.
✖ Can't convert from `j` <double> to <integer> due to loss of precision.

i guess there is some syntax errors in mutate, but how can I solve this?


Solution

  • It is probably best to copy the lon/lat-values of the previous day to a separate column, and then do the calculation rowwise:

    library(tidyverse)
    library(geosphere)
    
    df <- data.frame(
      id = c(rep("A", 5), rep("B", 5)),
      date = as.Date(as.Date("2022-6-1"):as.Date("2022-6-10"), origin="1970-01-01"),
      lon = 101:110,
      lat = 1:10
    )
    
    df %>% group_by(id) %>%
      mutate(across(c(lon, lat), lag, order_by = date, .names = "prev_{.col}")) %>%
      rowwise() %>%
      mutate(dist = distGeo(c(lon, lat), c(prev_lon, prev_lat))) %>%
      ungroup()
    #> # A tibble: 10 × 7
    #>    id    date         lon   lat prev_lon prev_lat    dist
    #>    <chr> <date>     <int> <int>    <int>    <int>   <dbl>
    #>  1 A     2022-06-01   101     1       NA       NA     NA 
    #>  2 A     2022-06-02   102     2      101        1 156876.
    #>  3 A     2022-06-03   103     3      102        2 156829.
    #>  4 A     2022-06-04   104     4      103        3 156759.
    #>  5 A     2022-06-05   105     5      104        4 156666.
    #>  6 B     2022-06-06   106     6       NA       NA     NA 
    #>  7 B     2022-06-07   107     7      106        6 156409.
    #>  8 B     2022-06-08   108     8      107        7 156246.
    #>  9 B     2022-06-09   109     9      108        8 156060.
    #> 10 B     2022-06-10   110    10      109        9 155851.
    

    Created on 2022-06-15 by the reprex package (v2.0.1)