I have a data frame of individual animals with a unique ID, the lat/long where they were found, and the date they were found. The database has frequent returns of the same individual. I have over 2000 individuals. I want to add a column to my data frame to calculate euclidian distance between current location & previous location. I want to add a second column to tell me which calculation number I'm on for each individual. The data frame is already organized by sequential date. I'm trying to solve this in R.
Event | ID | Lat | Long |
1 | 1 | 31.89 | -80.98 |
2 | 2 | 31.54 | -80.12 |
3 | 1 | 31.45 | -81.92 |
4 | 1 | 31.64 | -81.82 |
5 | 2 | 31.23 | -80.98 |
Add a column so that now it looks like
Event | ID | Lat | Long | Dist. | Calculation # |
1 | 1 | 31.89 | -80.98 | - | 0 |
2 | 2 | 31.54 | -80.12 | - | 0 |
3 | 1 | 31.45 | -81.92 | Distance between event 1 & 3 | 1 |
4 | 1 | 31.64 | -81.82 | Distance between event 3 & 4 | 2 |
5 | 2 | 31.23 | -80.98 | Distance between event 2 & 5 | 1 |
Is there a faster way to do this without a for loop? I'm stuck on where to start. I know I can use a distance function from the geospatial package once, I have the uniqueID sorted, but I'm having trouble iterating through my data.
Here is one option which leans on the sf
package and dplyr
. The function sf::st_distance
calculates distances between pairs of points, and dplyr::lag
can be used to look "one row behind". You will want to confirm your coordinate system, which I guessed here is WGS84
dat <- read.table(text = " Event ID Lat Long
1 1 31.89 -80.98
2 2 31.54 -80.12
3 1 31.45 -81.92
4 1 31.64 -81.82
5 2 31.23 -80.98", h = T)
dat_sf <- st_as_sf(dat, coords = c('Long', 'Lat'), crs = 4326)
dat_sf %>%
arrange(ID) %>%
group_by(ID) %>%
mutate(distance = as.numeric(st_distance(geometry, lag(geometry), by_element = TRUE)),
calculation = row_number() - 1)
#> Simple feature collection with 5 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -81.92 ymin: 31.23 xmax: -80.12 ymax: 31.89
#> Geodetic CRS: WGS 84
#> # A tibble: 5 x 5
#> # Groups: ID [2]
#> Event ID geometry distance calculation
#> * <int> <int> <POINT [°]> <dbl> <dbl>
#> 1 1 1 (-80.98 31.89) NA 0
#> 2 3 1 (-81.92 31.45) 101524. 1
#> 3 4 1 (-81.82 31.64) 23155. 2
#> 4 2 2 (-80.12 31.54) NA 0
#> 5 5 2 (-80.98 31.23) 88615. 1
Created on 2022-11-14 by the reprex package (v2.0.0)