Search code examples
rdplyrlead

Using lead with dplyr to compute the difference between two time stamps


I want to find the difference between two time stamps based on finding the time stamp in one column based on a condition, "Start", and then finding the time stamp for the first row that meets another condition in that same column, "Stop". Basically we used a program to "start" a behavior and "stop" a behavior so we could calculate the duration of the behavior.

I've tried adapting the code found in this post: subtract value from previous row by group

but I can't figure out how to get the lead to work on meeting a condition in upcoming rows for the same column. It's complicated by the fact that there can be "event" behaviors that have a "start" but no "stop". Example data frame.

Data
Behavior             Modifier_1           Time_relative_s              
BodyLength           Start                122.11      
Growl                Start                129.70
Body Length          Stop                 132.26      
Body Length          Start                157.79      
Body Length          Stop                 258.85      
Body Length          Start                270.12    
Bark                 Start                272.26
Growl                Start                275.68
Body Length          Stop                 295.37

and I want this:

Behavior             Modifier_1           Time_relative_s       diff             
BodyLength           Start                122.11                10.15
Growl                Start                129.70                 
Body Length          Stop                 132.26                
Body Length          Start                157.79                101.06  
Body Length          Stop                 258.85      
Body Length          Start                270.12                25.25    
Bark                 Start                272.26
Growl                Start                275.68
Body Length          Stop                 295.37

I've tried using dplyr pipes:

test<-u%>%
    filter(Modifier_1 %in% c("Start","Stop")) %>%
    arrange(Time_Relative_s) %>%
    mutate(diff = lead(Time_Relative_s, default = first(Time_Relative_s=="Stop")-Time-Relative_s)

But I must not be using lead right because this just returns the Time_Relative_s for me in the diff column. Any suggestions? Thanks for the help!


Solution

  • We may need to create a grouping variable based on the occurrence of 'stop' and then get the difference of 'Time_relative_s' that corresponds to the location of first 'Start', 'Stop' values in 'Modifier_1'

    library(dplyr)
    df1 %>% 
       group_by(grp = cumsum(lag(Modifier_1 == "Stop", default = FALSE))) %>% 
       mutate(diff = Time_relative_s[match("Stop", Modifier_1)] - 
                      Time_relative_s[match("Start", Modifier_1)], 
              diff = replace(diff, row_number() > 1, NA_real_)) %>%
       ungroup %>%
       select(-grp)
    # A tibble: 9 x 4
    #  Behavior    Modifier_1 Time_relative_s  diff
    #  <chr>       <chr>                <dbl> <dbl>
    #1 BodyLength  Start                 122.  10.1
    #2 Growl       Start                 130.  NA  
    #3 Body Length Stop                  132.  NA  
    #4 Body Length Start                 158. 101. 
    #5 Body Length Stop                  259.  NA  
    #6 Body Length Start                 270.  25.2
    #7 Bark        Start                 272.  NA  
    #8 Growl       Start                 276.  NA  
    #9 Body Length Stop                  295.  NA  
    

    data

    df1 <- structure(list(Behavior = c("BodyLength", "Growl", "Body Length", 
    "Body Length", "Body Length", "Body Length", "Bark", "Growl", 
    "Body Length"), Modifier_1 = c("Start", "Start", "Stop", "Start", 
    "Stop", "Start", "Start", "Start", "Stop"), Time_relative_s = c(122.11, 
    129.7, 132.26, 157.79, 258.85, 270.12, 272.26, 275.68, 295.37
    )), row.names = c(NA, -9L), class = "data.frame")