I am working on a dataframe with processed gps positions. I have three variables: id
, that correponds to the identifier of every individual, TimeStamp
, that indicates the moment of the gps signal, and perimeter
that indicates whether the signal occurred within a given perimeter. I would like to create a table with the time spent within or out of the given perimeters in the correct order of occurrences.
Here is a reproducible example:
df <- data.frame(id=rep(1, 10),
TimeStamp=seq(as.POSIXct("2020-01-01 12:00:00"),
as.POSIXct("2020-01-01 16:30:00"),
length.out = 10),
perimeter=c(NA, NA, NA, "p1", "p1", "p1", NA, NA, "p2", "p2"))
And my desired output would be like this:
id perimeter time
1 NA 1.5
1 "p1" 1.5
1 NA 1
1 "p2" 1
I have approached a solution using the rle()
function:
df[is.na(df$perimeter),]$perimeter <- "OUT"
data.frame(perimeter=rle(df$perimeter)$value,
time=(rle(df$perimeter)$length*30)/60)
However, it performs an estimation of the time given the length of a sequence of duplicates within the vector, and as I have missing values, subtracting the TimeStamp of the first duplicates from the last one would be more accurate.
A dplyr solution:
df %>%
mutate(perimeter = forcats::fct_explicit_na(df$perimeter),
visit = cumsum(perimeter != lag(perimeter) | is.na(lag(perimeter)))) %>%
group_by(id, visit, perimeter) %>%
summarise(time = difftime(max(TimeStamp) + 1800, min(TimeStamp), unit = "hour")) %>%
ungroup() %>% select(-visit)
#> # A tibble: 4 x 3
#> id perimeter time
#> <dbl> <fct> <drtn>
#> 1 1 (Missing) 1.5 hours
#> 2 1 p1 1.5 hours
#> 3 1 (Missing) 1.0 hours
#> 4 1 p2 1.0 hours