I have a data frame of camera trap detection data with over 50 000 rows and I would like to identify and/or remove the observations of one species that occur within a specified time period of another species at the same camera station.
Below is an example of my data frame:
Species StationID DateTime
1 Human A 2013-05-20 10:00:00
2 Dog A 2013-05-20 10:09:00
3 Dog A 2013-05-21 10:40:00
4 Puma B 2013-05-21 15:59:00
5 Dog B 2013-05-23 10:05:00
6 Human B 2013-05-23 10:10:00
If I wanted to identify/remove all Dog detections within 10 mins either side of a Human detection at the same camera station then I would expect the following data to be returned:
Species StationID DateTime
1 Human A 2013-05-20 10:00:00
2 Dog A 2013-05-21 10:40:00
3 Puma B 2013-05-21 15:59:00
4 Human B 2013-05-23 10:10:00
Among other things, I tried splitting Human and Dog detections into separate data frames and creating upper and lower DateTime columns for Dog observations based on the desired time tolerance of + or - 10 mins. I then used a fuzzy_left_join as below, which worked well for conditioning on StationID but it did not return the correct detections based on the DateTime operations specified.
Dog_HumanDF <- fuzzy_left_join(DogDF, HumanDF,
by = c("StationID" = "StationID",
"DateTimeDogLower" = "DateTimeHuman",
"DateTimeDogUpper" = "DateTimeHuman"),
match_fun = list(`==`, `<=` , `>=`))
I have searched extensively for similar problems and solutions but could not finding anything that suits my purposes. I would prefer a solution that did not require separate data frames to be generated like I had to for the fuzzy_join. Any help is very much appreciated!
The below solution was provided courtesy of Barrett Wolfe on another platform and it works perfectly! I hope this might be useful to other users as well.
no_bad_dogs <- function(df){
output <- list()
for(i in 1:length(unique(df$StationID))){
stat_df <- df[df$StationID==unique(df$StationID)[i],]
if("Human" %in% stat_df$Species & "Dog" %in% stat_df$Species){
HumanDF <- stat_df[stat_df$Species == "Human",]
dog_index <- which(stat_df$Species=="Dog")
DogDF <- stat_df[dog_index,]
bad_in_dog_index <- sapply(DogDF$DateTime, FUN = function(x, human_times){return(any( x >= human_times-600 & x <= human_times+600))},human_times = HumanDF$DateTime)
if(any(bad_in_dog_index)){
output[[i]] <- stat_df[-dog_index[bad_in_dog_index],]} else {output[[i]] <- stat_df }
} else { output[[i]] <- stat_df }
}
do.call("rbind", output)
}