Search code examples
rfor-loopgeospatialdata-wrangling

How to add a column to my data frame that calculates the distance between lat/long points between the previous point with matching IDs


I have a data frame of individual animals with a unique ID, the lat/long where they were found, and the date they were found. The database has frequent returns of the same individual. I have over 2000 individuals. I want to add a column to my data frame to calculate euclidian distance between current location & previous location. I want to add a second column to tell me which calculation number I'm on for each individual. The data frame is already organized by sequential date. I'm trying to solve this in R.

Event ID Lat Long
1 1 31.89 -80.98
2 2 31.54 -80.12
3 1 31.45 -81.92
4 1 31.64 -81.82
5 2 31.23 -80.98

Add a column so that now it looks like

Event ID Lat Long Dist. Calculation #
1 1 31.89 -80.98 - 0
2 2 31.54 -80.12 - 0
3 1 31.45 -81.92 Distance between event 1 & 3 1
4 1 31.64 -81.82 Distance between event 3 & 4 2
5 2 31.23 -80.98 Distance between event 2 & 5 1

Is there a faster way to do this without a for loop? I'm stuck on where to start. I know I can use a distance function from the geospatial package once, I have the uniqueID sorted, but I'm having trouble iterating through my data.


Solution

  • Here is one option which leans on the sf package and dplyr. The function sf::st_distance calculates distances between pairs of points, and dplyr::lag can be used to look "one row behind". You will want to confirm your coordinate system, which I guessed here is WGS84/4326.

    library(dplyr)
    library(sf)
    
    
    
    dat <- read.table(text = " Event    ID  Lat Long
    1   1   31.89   -80.98
    2   2   31.54   -80.12
    3   1   31.45   -81.92
    4   1   31.64   -81.82
    5   2   31.23   -80.98", h = T)
    
    
    dat_sf <- st_as_sf(dat, coords = c('Long', 'Lat'), crs = 4326)
    
    
    dat_sf %>%
      arrange(ID) %>%
      group_by(ID) %>%
      mutate(distance = as.numeric(st_distance(geometry, lag(geometry), by_element = TRUE)),
             calculation = row_number() - 1)
    #> Simple feature collection with 5 features and 4 fields
    #> Geometry type: POINT
    #> Dimension:     XY
    #> Bounding box:  xmin: -81.92 ymin: 31.23 xmax: -80.12 ymax: 31.89
    #> Geodetic CRS:  WGS 84
    #> # A tibble: 5 x 5
    #> # Groups:   ID [2]
    #>   Event    ID       geometry distance calculation
    #> * <int> <int>    <POINT [°]>    <dbl>       <dbl>
    #> 1     1     1 (-80.98 31.89)      NA            0
    #> 2     3     1 (-81.92 31.45)  101524.           1
    #> 3     4     1 (-81.82 31.64)   23155.           2
    #> 4     2     2 (-80.12 31.54)      NA            0
    #> 5     5     2 (-80.98 31.23)   88615.           1
    

    Created on 2022-11-14 by the reprex package (v2.0.0)