Search code examples
rdataframedplyrgps

How can I compute the duration of events splited in different rows of a dataframe


I am working on a dataframe with processed gps positions. I have three variables: id, that correponds to the identifier of every individual, TimeStamp, that indicates the moment of the gps signal, and perimeter that indicates whether the signal occurred within a given perimeter. I would like to create a table with the time spent within or out of the given perimeters in the correct order of occurrences.

Here is a reproducible example:

df <- data.frame(id=rep(1, 10),
                 TimeStamp=seq(as.POSIXct("2020-01-01 12:00:00"), 
                               as.POSIXct("2020-01-01 16:30:00"), 
                               length.out = 10),
                 perimeter=c(NA, NA, NA, "p1", "p1", "p1", NA, NA, "p2", "p2"))

And my desired output would be like this:

id perimeter time
1  NA        1.5
1  "p1"      1.5
1  NA        1
1  "p2"      1

I have approached a solution using the rle() function:

df[is.na(df$perimeter),]$perimeter <- "OUT"

data.frame(perimeter=rle(df$perimeter)$value,
           time=(rle(df$perimeter)$length*30)/60)

However, it performs an estimation of the time given the length of a sequence of duplicates within the vector, and as I have missing values, subtracting the TimeStamp of the first duplicates from the last one would be more accurate.


Solution

  • A dplyr solution:

    df %>% 
      mutate(perimeter = forcats::fct_explicit_na(df$perimeter),
             visit = cumsum(perimeter != lag(perimeter) | is.na(lag(perimeter)))) %>% 
      group_by(id, visit, perimeter) %>% 
      summarise(time = difftime(max(TimeStamp) + 1800, min(TimeStamp), unit = "hour")) %>%
      ungroup() %>% select(-visit)
    
    #> # A tibble: 4 x 3
    #>      id perimeter time     
    #>   <dbl> <fct>     <drtn>   
    #> 1     1 (Missing) 1.5 hours
    #> 2     1 p1        1.5 hours
    #> 3     1 (Missing) 1.0 hours
    #> 4     1 p2        1.0 hours