Search code examples
rtimestamptime-seriestelemetry

How to define encounter periods by using first and last timestamps within a time series in R


I work with electronically tagged fish. A snippet of my telemetry data (dataframe "d") is below. Each timestamp represents a detection for a unique fish.

TagID          Detection              Location      RiverKm
163            02/23/2012 03:17:44    Alcatraz_E     4.414
163            02/23/2012 03:56:25    Alcatraz_E     4.414
163            04/14/2012 15:10:20    Alcatraz_E     4.414
163            04/14/2012 15:12:11    Alcatraz_N     4.414
163            03/11/2012 08:59:48    Alcatraz_N     4.414
163            03/11/2012 09:02:15    Alcatraz_N     4.414
163            03/11/2012 09:04:05    Alcatraz_N     4.414
163            03/11/2012 09:04:06    Alcatraz_N     4.414
163            03/11/2012 09:06:09    Alcatraz_N     4.414
163            03/11/2012 09:06:11    Alcatraz_E     4.414

There many different TagIDs (individual fish). I'd like to categorize the detections into encounter periods for each fish, by identifying a start time ("arrival") and an end time ("departure"), with a critical value of 1 hour. For example, for the above fish (TagID 163), the output would be:

TagID       arrival                  departure            Location        RiverKm
163        02/23/2012 03:17:44    02/23/2012 03:56:25     Alcatraz_E       4.414 
163        04/14/2012 15:10:2     04/14/2012 15:12:11     Alcatraz_N       4.414
163        03/11/2012 08:59:48    03/11/2012 09:06:11     Alcatraz_E       4.414

I'd like to create a loop (or any other code structure) that does the following:

for j in 1:length(unique(d$TagID))
  1. Identify the time of the first detection ("t1")
  2. IF the next detection for that tag in the time series ("t2") is less than one hour apart from t1, skip it and continue to the next detection; ELSE, place t1 in an "arrival" vector and t2 in a "departure vector.
  3. Stop when every arrival and departure timestamp has been categorized for each TagID.

I have no idea how to do this in the most efficient way, and would appreciate your help immensely.

Thank you!


Solution

  • You should first order your data by date. That's why you should convert your Detection variable to a valid r datetime type: POSIXct. once your data is ordered, using diff , and cumsum you can create a grouping variable for jump detection: here a jump is occurred after at least an hour(60 minutes). I am using data.table for sugar syntax in grouping operations but it is not necessary specially if you don't have a hudge amount of data.

    Here my complete code:

    library(data.table)
    ## data coerecion
    d$Detection <- 
      as.POSIXct(strptime(d$Detection,'%m/%d/%Y %H:%M:%S'))
    ## sort using Detecetion
    d <- d[order(d$Detection),]
    # id is incrementing variable that detects a jump of an hour
    d$id <- cumsum(c(F,round(diff(d$Detection)/60) >60))
    ## you don't mention how to choose location,Riverkm so I take by default the first ones
    setDT(d)[,list(start   =Detection[1],
                   end     =Detection[length(Detection)],
                   Location=Location[1],
                   RiverKm =RiverKm[1]),
             "TagID,id"]
    
    #    TagID id               start                 end   Location RiverKm
    # 1:   163  0 2012-02-23 03:17:44 2012-02-23 03:56:25 Alcatraz_E   4.414
    # 2:   163  1 2012-03-11 08:59:48 2012-03-11 09:06:11 Alcatraz_N   4.414
    # 3:   163  2 2012-04-14 15:10:20 2012-04-14 15:12:11 Alcatraz_E   4.414