Search code examples
rdplyrlubridate

How do I drop duplicates within groups that were collected within 60min of one another?


I have camera trap data where I want to remove potentially duplicated animal detections. I am setting the interval as 60min (1 hour) so that any individual detected of the same species at the same camera is the same individual if it is detected again within <60min. My data is collected at multiple blocks with multiple sites within a block.

#Data example:
Block<-c("a","a","a","a","a","b","b","b","b","b") #2 block
Site<-c("p1","p1","p2","p2","p2","p1","p1","p1","p2","p2") #two sites
Species<-c("c","c","c","c","e","d","d","c","c","c")
datetime<-c("2021-03-29 05:45:00","2021-03-29 06:40:00","2021-03-30 05:45:00","2021-03-30 07:45:00","2021-03-29 09:45:00","2021-03-29 05:45:00","2021-03-29 05:55:00","2021-03-29 08:45:00","2021-03-29 10:45:00","2021-03-30 10:59:00")
df<-data.frame(Block, Site, Species, datetime)

#what I want for the output: 

   Block Site Species            datetime
1      a   p1       c 2021-03-29 05:45:00
2      a   p2       c 2021-03-30 05:45:00
3      a   p2       c 2021-03-30 07:45:00
4      a   p2       e 2021-03-29 09:45:00
5      b   p1       d 2021-03-29 05:45:00
6      b   p1       c 2021-03-29 08:45:00
7      b   p2       c 2021-03-29 10:45:00
8      b   p2       c 2021-03-30 10:59:00

The tricky part, at least for me, is I need to remove duplicates of Block/Site/Species that appear within 1 hour of one another (I can't just say keep the first within in hour period of the day).

Thank you for your help.


Solution

  • Using diff and then filter by set intv

    library(dplyr)
    
    intv <- 60
    
    df %>% 
      mutate(datetime = as.POSIXct(datetime)) %>% 
      arrange(Block, Site, Species, datetime) %>% 
      filter(c(intv + 1, diff(datetime, units="mins")) > intv, 
             .by = c(Block, Site, Species))
      Block Site Species            datetime
    1     a   p1       c 2021-03-29 05:45:00
    2     a   p2       c 2021-03-30 05:45:00
    3     a   p2       c 2021-03-30 07:45:00
    4     a   p2       e 2021-03-29 09:45:00
    5     b   p1       c 2021-03-29 08:45:00
    6     b   p1       d 2021-03-29 05:45:00
    7     b   p2       c 2021-03-29 10:45:00
    8     b   p2       c 2021-03-30 10:59:00