Group GPS locations based on the first and last two points of the event

Everyone.

I'm trying to identify attempted road crossing events using GPS data. My initial plan was to buffer the road by two distances (10km, 500m) on the west and east side of a road, intersect the GPS data with the buffers, and identify/group instances in which an individual travels from the 10km buffer into the 500m buffer and back to the 10km buffer on the same side of the road. The number of points that fall within the 500m buffer and between the 10km points could range from 1 to n-2 points meaning the groupings of attempted road crossing events will be a minimum of 3 points to a maximum of n total points. I'm not quite sure how to identify these groupings as the group would have to be identified by the first and last two rows of the grouping with all groups potentially being different lengths. I've provided simple example input and results datasets. Note that the attempted crossing events would also need to be grouped by id and year. That isn't an issue but something I thought I'd add for clarity. Also, if you think there's a better way of accomplishing this task please feel free to suggest a solution.

Thanks for your help!

library(tidyverse)

set.seed(321)

# Example data set. 20 individuals across 20 years with locations assigned to either the west or east side of the road and in the 10km or 500m buffers.
df<-tibble(id = rep(1:20,
                    each = 10),
           year = rep(2000:2019,
                      each = 10),
           location = sample(c('west_10km',
                               'west_500m',
                               'east_10km',
                               'east_500m'),
                             size = 200,
                             replace = TRUE))

# What the results should look like when attempted crossing events grouped by id and year have been pulled from the df dataset.
results<-tibble(id = c(1,1,1,1,1,1,9,9,9,9,9,15,15,15,
                       19,19,19,19,19,19,19,20,20,20,20),
                year = c(2000,2000,2000,2000,2000,2000,2008,2008,
                         2008,2008,2008,2014,2014,2014,2018,2018,
                         2018,2018,2018,2018,2018,2019,2019,2019,2019),
                location = c('west_10km','west_500m','west_10km','west_10km',
                         'west_500m','west_10km','east_10km','east_500m',
                         'east_500m','east_500m','east_10km','west_10km',
                         'west_500m','west_10km','east_10km','east_500m',
                         'east_500m','east_10km','east_10km','east_500m',
                         'east_10km','west_10km','west_500m','west_500m',
                         'west_10km'),
                group_event = c(1,1,1,2,2,2,3,3,3,3,3,4,4,4,5,5,5,5,6,6,6,7,7,7,7)))

EDIT: Thanks to @Skaaqs for their help. This isn't the first time I've forgotten about lag()/lead(), but I think this time it will stick. I had to modify a few lag()/lead() arguments to correctly identify the row of interest. I also wasn't sure how to identify all of the 'in/500m' between the starting and ending 'out/10km' points so I did so by a brute force. It's not pretty but it worked as long as there weren't >10 'in' locations within a crossing attempt. I then added another ifelse() statement to identify the point at which an individual left the area of interest. Unfortunately, events in which the individual did successfully crossed were pulled in so I added a simple quality control check by specifying that the first and last location in an event needed to be on the same side and out of the 'in' zone (e.g., east_10km, east_500m,..., east_10km). Again, if anyone has a cleaner solution, please feel free to chime in.

df_results <- df %>%
  tidyr::separate(col = "repulsion_location", 
                  sep = "_", 
                  into = c("side", 
                           "dist")) %>%
  mutate(distm = ifelse(test = dist == "500m", 
                        yes = 500,
                        no = 10000)) %>%
  group_by(id, year) %>%
  mutate(cross = ifelse(test = side == lag(side), 
                        yes = 1, 
                        no = 0)) %>%
  mutate(cross_atmpt = ifelse(test = side == lag(side) & distm < lag(distm) & side == lead(side), 
                              yes = 1,
                              no = 0),
         cross_atmpt =  ifelse(test = side == lead(side) & lead(cross_atmpt) == 1, 
                  yes = 1,
                  no = cross_atmpt),
         cross_atmpt =  ifelse(test = side == lag(side) & lag(cross_atmpt) == 1 & distm == 500, 
                               yes = 1,
                               no = cross_atmpt),
         cross_atmpt =  ifelse(test = side == lag(side) & lag(cross_atmpt) == 1 & distm == 500, 
                               yes = 1,
                               no = cross_atmpt),
         cross_atmpt =  ifelse(test = side == lag(side) & lag(cross_atmpt) == 1 & distm == 500, 
                               yes = 1,
                               no = cross_atmpt),
         cross_atmpt =  ifelse(test = side == lag(side) & lag(cross_atmpt) == 1 & distm == 500, 
                               yes = 1,
                               no = cross_atmpt),
         cross_atmpt =  ifelse(test = side == lag(side) & lag(cross_atmpt) == 1 & distm == 500, 
                               yes = 1,
                               no = cross_atmpt),
         cross_atmpt =  ifelse(test = side == lag(side) & lag(cross_atmpt) == 1 & distm == 500, 
                               yes = 1,
                               no = cross_atmpt),
         cross_atmpt =  ifelse(test = side == lag(side) & lag(cross_atmpt) == 1 & distm == 500, 
                               yes = 1,
                               no = cross_atmpt),
         cross_atmpt =  ifelse(test = side == lag(side) & lag(cross_atmpt) == 1 & distm == 500, 
                               yes = 1,
                               no = cross_atmpt),
         cross_atmpt =  ifelse(test = side == lag(side) & lag(cross_atmpt) == 1 & distm == 500, 
                               yes = 1,
                               no = cross_atmpt),
         cross_atmpt =  ifelse(test = side == lag(side) & lag(cross_atmpt) == 1 & distm == 500, 
                               yes = 1,
                               no = cross_atmpt),
         cross_atmpt = ifelse(test = side == lag(side) & distm > lag(distm) & lag(cross_atmpt) == 1, 
                              yes = 1,
                              no = cross_atmpt),
         cross_atmpt = ifelse(test = is.na(cross_atmpt), 
                              yes = 0, 
                              no = cross_atmpt)) %>% 
  group_by(crossing_event_id = cumsum(cross_atmpt == 0)) %>% 
  filter(cross_atmpt == 1) %>% 
  group_by(crossing_event_id) %>% 
  mutate(crossing_event_id = cur_group_id())

    #' _____________________________________________________________________________
    #' ## Quality Control
    #' 
    # Check if each crossing attempt is bookended by an out/10km point.
    repulsion_pts_clean <- repulsion_pts %>% 
      group_by(crossing_event_id) %>% 
      filter(first(distm) == 10000 & last(distm) == 10000)
    
    repulsion_pts_null <- repulsion_pts %>% 
      st_drop_geometry()

repulsion_pts_clean_null <- repulsion_pts_clean %>% 
  st_drop_geometry()

false_repulsion <- anti_join(repulsion_pts_null,
                             repulsion_pts_clean_null )

Solution

If you want the first and last events with in a group, you could use group_by() and first() and last() in the dplyr package.

But what if you wanted to summarize all movements by each individual within year? You could define each case with logic, e.g., west_10km to west_500m is a "closer" movement and west_10km to east_10km is a "long cross".

library(dplyr)
library(tidyr)

df <- df %>%
  tidyr::separate(col = "loc", sep = "_", into = c("side", "dist")) %>%
  mutate(distm = ifelse(dist == "500m", 500, 10000)) %>%
  group_by(id, year) %>%
  mutate(cross = ifelse(side == lag(side), 1, 0)) %>%
  mutate(cross_atmpt = ifelse(side == lag(side) & distm < lag(distm), 1, 0))

Then you can summarize by id and year:

df %>%
  group_by(id, year) %>%
  summarize(tot_cross = sum(cross, na.rm = TRUE),
            tot_cross_atmpt = sum(cross_atmpt, na.rm = TRUE))