So I have some GPS data, where the devices turns on and logs the duration of an event whenever said event occurs. At the end of the event, it then takes one to two GPS fixes of where it occurred, unless interrupted. To use the data I need to associate each event with a unique ID, so that I can then calculate some summaries:
Essentially, what I need to do is add a column to the dataframe which associates each event with a unique ID number like so:
As shown above, I want to assign a ascending number to each event that runs between the "event_start"
through to the associated GPS_fix
- which can be one or two. However, some of the events are interrupted ("event_endint"
instead of "event_end"
) meaning that either no fixes could not be collected or just one fix was established. In the case of no fixes, the numbering of the event_id
would just continue on until the next GPS_fix
. In the case of one fix, it would get it's own unique ID which extends to the GPS row, and in the event of two fixes, it would extend to both GPS fix rows.
Note, it would be great if the code were dplyr
friendly, as I have multiple GPS units and I will need to distinguish between them. Ultimately, I am going to be calculating summaries and I intend to do that using dplyr
/tidyr
Below is some example data
df <- structure(list(date = c("1/01/2000", "1/01/2000", "1/01/2000",
"1/01/2000", "1/01/2000", "1/01/2000", "1/01/2000", "1/01/2000",
"2/01/2000", "2/01/2000", "2/01/2000", "2/01/2000", "3/01/2000",
"3/01/2000", "3/01/2000", "5/01/2000", "5/01/2000", "5/01/2000",
"5/01/2000", "5/01/2000", "5/01/2000", "5/01/2000", "5/01/2000",
"5/01/2000", "5/01/2000", "5/01/2000", "5/01/2000", "5/01/2000",
"5/01/2000"), time = c("1:00", "2:00", "3:00", "4:00", "5:00",
"6:00", "6:01", "6:02", "20:00", "21:00", "22:00", "23:00", "0:00",
"1:00", "6:01", "14:00", "15:00", "16:00", "17:00", "18:00",
"19:00", "14:00", "15:00", "16:00", "17:00", "18:00", "19:00",
"19:01:00 am", "19:02:00 am"), datatype = c("event_start", "event",
"event", "event", "event", "event_end", "GPS_fix", "GPS_fix",
"event_start", "event", "event", "event", "event", "event_endint",
"GPS_fix", "event_start", "event", "event", "event", "event",
"event_endint", "event_start", "event", "event", "event", "event",
"event_end", "GPS_fix", "GPS_fix"), lat = c(NA, NA, NA, NA, NA,
NA, 42.1, 42.05, NA, NA, NA, NA, NA, NA, 43.1, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 50, 50), long = c(NA, NA, NA,
NA, NA, NA, 180.1, 180.1, NA, NA, NA, NA, NA, NA, 181, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 170, 170)), class = "data.frame", row.names = c(NA,
-29L))
I have had a crack at coding it using the zoo
package but I honestly can't figure out how to do it.
We can use a condition to check whether the datatype
is equal to "event_start", and at the same time the previous row does not equal "event_endint", then cumsum
the logical values.
library(dplyr)
df %>%
mutate(event_id = cumsum(grepl("event_start", datatype) & lag(datatype, default = first(datatype)) != "event_endint"))
date time datatype lat long event_id
1 1/01/2000 1:00 event_start NA NA 1
2 1/01/2000 2:00 event NA NA 1
3 1/01/2000 3:00 event NA NA 1
4 1/01/2000 4:00 event NA NA 1
5 1/01/2000 5:00 event NA NA 1
6 1/01/2000 6:00 event_end NA NA 1
7 1/01/2000 6:01 GPS_fix 42.10 180.1 1
8 1/01/2000 6:02 GPS_fix 42.05 180.1 1
9 2/01/2000 20:00 event_start NA NA 2
10 2/01/2000 21:00 event NA NA 2
11 2/01/2000 22:00 event NA NA 2
12 2/01/2000 23:00 event NA NA 2
13 3/01/2000 0:00 event NA NA 2
14 3/01/2000 1:00 event_endint NA NA 2
15 3/01/2000 6:01 GPS_fix 43.10 181.0 2
16 5/01/2000 14:00 event_start NA NA 3
17 5/01/2000 15:00 event NA NA 3
18 5/01/2000 16:00 event NA NA 3
19 5/01/2000 17:00 event NA NA 3
20 5/01/2000 18:00 event NA NA 3
21 5/01/2000 19:00 event_endint NA NA 3
22 5/01/2000 14:00 event_start NA NA 3
23 5/01/2000 15:00 event NA NA 3
24 5/01/2000 16:00 event NA NA 3
25 5/01/2000 17:00 event NA NA 3
26 5/01/2000 18:00 event NA NA 3
27 5/01/2000 19:00 event_end NA NA 3
28 5/01/2000 19:01:00 am GPS_fix 50.00 170.0 3
29 5/01/2000 19:02:00 am GPS_fix 50.00 170.0 3