I work with electronically tagged fish. A snippet of my telemetry data (dataframe "d") is below. Each timestamp represents a detection for a unique fish.
TagID Detection Location RiverKm
163 02/23/2012 03:17:44 Alcatraz_E 4.414
163 02/23/2012 03:56:25 Alcatraz_E 4.414
163 04/14/2012 15:10:20 Alcatraz_E 4.414
163 04/14/2012 15:12:11 Alcatraz_N 4.414
163 03/11/2012 08:59:48 Alcatraz_N 4.414
163 03/11/2012 09:02:15 Alcatraz_N 4.414
163 03/11/2012 09:04:05 Alcatraz_N 4.414
163 03/11/2012 09:04:06 Alcatraz_N 4.414
163 03/11/2012 09:06:09 Alcatraz_N 4.414
163 03/11/2012 09:06:11 Alcatraz_E 4.414
There many different TagIDs (individual fish). I'd like to categorize the detections into encounter periods for each fish, by identifying a start time ("arrival") and an end time ("departure"), with a critical value of 1 hour. For example, for the above fish (TagID 163), the output would be:
TagID arrival departure Location RiverKm
163 02/23/2012 03:17:44 02/23/2012 03:56:25 Alcatraz_E 4.414
163 04/14/2012 15:10:2 04/14/2012 15:12:11 Alcatraz_N 4.414
163 03/11/2012 08:59:48 03/11/2012 09:06:11 Alcatraz_E 4.414
I'd like to create a loop (or any other code structure) that does the following:
for j in 1:length(unique(d$TagID))
I have no idea how to do this in the most efficient way, and would appreciate your help immensely.
Thank you!
You should first order your data by date. That's why you should convert your Detection variable to a valid r datetime type: POSIXct. once your data is ordered, using diff
, and cumsum
you can create a grouping variable for jump detection: here a jump is occurred after at least an hour(60 minutes). I am using data.table
for sugar syntax in grouping operations but it is not necessary specially if you don't have a hudge amount of data.
Here my complete code:
library(data.table)
## data coerecion
d$Detection <-
as.POSIXct(strptime(d$Detection,'%m/%d/%Y %H:%M:%S'))
## sort using Detecetion
d <- d[order(d$Detection),]
# id is incrementing variable that detects a jump of an hour
d$id <- cumsum(c(F,round(diff(d$Detection)/60) >60))
## you don't mention how to choose location,Riverkm so I take by default the first ones
setDT(d)[,list(start =Detection[1],
end =Detection[length(Detection)],
Location=Location[1],
RiverKm =RiverKm[1]),
"TagID,id"]
# TagID id start end Location RiverKm
# 1: 163 0 2012-02-23 03:17:44 2012-02-23 03:56:25 Alcatraz_E 4.414
# 2: 163 1 2012-03-11 08:59:48 2012-03-11 09:06:11 Alcatraz_N 4.414
# 3: 163 2 2012-04-14 15:10:20 2012-04-14 15:12:11 Alcatraz_E 4.414