Search code examples
rdistance

How do I correct this code to properly measure distance between points?


I am working with an animal movement dataset. I have been trying to code the movement distance as opposed to just using the distance formula in excel, as I want to be able to apply the process to potentially larger datasets. I found this old post (Calculating Daily Scaled Travel Distance for UTM Animal Movement Data in R) which has generally worked. Here is my current progress:

library(sf)
library(readr)
library(dplyr)

## Load in the data and clean it to ensure correct format.
ToadDistCalc <- read_csv("Documents/RProjects/ToadMovement/ToadDistCalc.csv")
ToadDistCalc$DateTime <- as.POSIXct(ToadDistCalc$DateTime,
                                    format = "%Y-%m-%d %H:%M:%S",
                                    tz = "America/Jamaica")

## Make the data a spatial object. Look up what the correct EPSG is.
toad.so <- st_as_sf(
  ToadDistCalc,
  coords = c('X', 'Y'),
  crs = "EPSG+26918")

## Calculate distance. group_by attribute dependent on data.
toad.so <- toad.so %>%
  group_by(ID) %>%
  mutate(
    lead = geometry[row_number() + 1],
    dist = st_distance(geometry, lead, by_element = T),)

However, when I inspect the data, the movement distances are incorrectly assigned to the date before they occurred.

If I try to fix this by changing the lead variable to be row_number() - 1, I get this error

enter image description here

Changing the position of the variable names does not affect the error. I would also like to average the movement by days. I can do that easily in excel, but I would also like to know the coding solution.

UPDATE:

Here is a version of the code that allows for time averaging


library(sf)
library(readr)
library(dplyr)

## Load in the data and clean it to ensure correct format.
ToadDistCalc <- read_csv("Documents/RProjects/ToadMovement/ToadDistCalc.csv")
ToadDistCalc$DateTime <- as.POSIXct(ToadDistCalc$DateTime,
                                    format = "%Y-%m-%d %H:%M:%S",
                                    tz = "America/Jamaica")

## Make the data a spatial object. Look up what the correct EPSG is.
toad.so <- st_as_sf(
  ToadDistCalc,
  coords = c('X', 'Y'),
  crs = "EPSG+26918")

## Calculate distance. group_by attribute dependent on data.
succ.dist = function(toad){
  c(0,
    st_distance(
      toad$geometry[-nrow(toad)],
      toad$geometry[-1],
      by_element=TRUE))
}

step.distances = unlist(lapply(split(toad.so, toad.so$ID), succ.dist))
toad.so$step.distances = step.distances

## Time averaged
succ.date = function(toadDate){
  c(0, difftime(
    strptime(toadDate$Date[-1], "%m/%d/%Y"),
    strptime(toadDate$Date[-nrow(toadDate)], "%m/%d/%Y"),
    units="days"))
}

DaysBetween = unlist(lapply(split(toad.so, toad.so$ID), succ.date))
toad.so$DaysBetween = DaysBetween

toad.so$AvgdDist <- step.distances/DaysBetween



Solution

  • It looks like you are trying to calculate the distance between successive geometry entries within ID variables for each toad.

    Lets make a sample data set of random points on the UK grid system of 5 toads each of which has 20 points.

    > pts = st_as_sf(data.frame(x=runif(100), y=runif(100)), coords=1:2, crs="EPSG:27700")
    > pts$ID = rep(1:5, each=20)
    

    To compute the successive distances in a geometry, use st_distance but drop the last element from the first argument and the first element from the second argument. Add a 0 at the start to get the distance travelled at the first point. Then you are comparing point 1 with 2, then 2 with 3, and so on.

    succ.dist = function(toad){
       c(0,
         st_distance(
           toad$geometry[-nrow(toad)],
           toad$geometry[-1],
         by_element=TRUE))
     }
    

    That works on a full data frame. But you want to apply this to each ID value to get the distances for each toad. Use a split-apply-combine strategy:

    step.distances = unlist(lapply(split(pts, pts$ID), succ.dist))
    

    Now you can add that to your data frame:

    > pts$step.distances = step.distances
    > head(pts)
    Simple feature collection with 6 features and 2 fields
    Geometry type: POINT
    Dimension:     XY
    Bounding box:  xmin: 0.08351158 ymin: 0.161554 xmax: 0.9587816 ymax: 0.7150356
    Projected CRS: OSGB36 / British National Grid
                         geometry ID step.distances
    1 POINT (0.1272845 0.4104881)  1      0.0000000
    2  POINT (0.4923749 0.161554)  1      0.4418814
    3  POINT (0.08351158 0.70452)  1      0.6796920
    4   POINT (0.9587816 0.38298)  1      0.9324621
    5 POINT (0.5181507 0.7150356)  1      0.5517396
    6 POINT (0.3361486 0.4482891)  1      0.3229218