Search code examples
rdataframedatetimecamera

Filter R data frame conditioned on multiple column values and date & time calculation


I have a data frame of camera trap detection data with over 50 000 rows and I would like to identify and/or remove the observations of one species that occur within a specified time period of another species at the same camera station.

Below is an example of my data frame:

   Species    StationID    DateTime
1  Human      A            2013-05-20 10:00:00
2  Dog        A            2013-05-20 10:09:00
3  Dog        A            2013-05-21 10:40:00
4  Puma       B            2013-05-21 15:59:00
5  Dog        B            2013-05-23 10:05:00
6  Human      B            2013-05-23 10:10:00

If I wanted to identify/remove all Dog detections within 10 mins either side of a Human detection at the same camera station then I would expect the following data to be returned:

   Species    StationID    DateTime
1  Human      A            2013-05-20 10:00:00
2  Dog        A            2013-05-21 10:40:00
3  Puma       B            2013-05-21 15:59:00
4  Human      B            2013-05-23 10:10:00

Among other things, I tried splitting Human and Dog detections into separate data frames and creating upper and lower DateTime columns for Dog observations based on the desired time tolerance of + or - 10 mins. I then used a fuzzy_left_join as below, which worked well for conditioning on StationID but it did not return the correct detections based on the DateTime operations specified.

Dog_HumanDF <- fuzzy_left_join(DogDF, HumanDF, 
                                  by = c("StationID" = "StationID",
                                         "DateTimeDogLower" = "DateTimeHuman", 
                                         "DateTimeDogUpper" = "DateTimeHuman"),  
                                  match_fun = list(`==`, `<=` , `>=`))

I have searched extensively for similar problems and solutions but could not finding anything that suits my purposes. I would prefer a solution that did not require separate data frames to be generated like I had to for the fuzzy_join. Any help is very much appreciated!


Solution

  • The below solution was provided courtesy of Barrett Wolfe on another platform and it works perfectly! I hope this might be useful to other users as well.

    no_bad_dogs <- function(df){
    output <- list()
    for(i in 1:length(unique(df$StationID))){
    stat_df <- df[df$StationID==unique(df$StationID)[i],]
    if("Human" %in% stat_df$Species & "Dog" %in% stat_df$Species){
    HumanDF <- stat_df[stat_df$Species == "Human",]
    dog_index <- which(stat_df$Species=="Dog")
    DogDF <- stat_df[dog_index,]
    bad_in_dog_index <- sapply(DogDF$DateTime, FUN = function(x, human_times){return(any( x >= human_times-600 & x <= human_times+600))},human_times = HumanDF$DateTime)
    if(any(bad_in_dog_index)){
    output[[i]] <- stat_df[-dog_index[bad_in_dog_index],]} else {output[[i]] <- stat_df }
    } else { output[[i]] <- stat_df }
    }
    do.call("rbind", output)
    }