Search code examples
rsubsequence

How can I find a subsequent trial based on a condition?


I am using R to manipulate a large dataset (dataset) that consists of 20,000+ rows. In my data, I have three important columns to focus on for this question: Trial_Nr (consisting of 90 trials), seconds (increasing in .02 second increments), and threat(fixation to threat: 1=yes, 0=no, NA). Within each trial, I need to answer when the initially fixates to threat (1), how long does it take for them to not fixate on threat (0). So basically, within each trial, I would need to find the first threat=1 and the subsequent threat=0 and subtract the time. I am able to get the first threat with this code:

initalfixthreat <- dataset %>%
                   group_by(Trial_Nr) %>%
                  slice(which(threat == '1')[1])

I am stumped on how to get the subsequent threat=0 within that trial number.

Here is an example of the data (sorry don't know how to format it better):

enter image description here

So for Trial_Nr=1, I would be interested in 689.9 seconds- 689.8. For Trial_Nr=2, I would want 690.04-689.96.

Please let me know if I was unclear and thank you all for your help!


Solution

  • One approach is:

    library(dplyr)
    
    df %>%
      group_by(Trial_Nr) %>%
      filter(!is.na(threat)) %>%
      mutate(flag = ifelse(threat == 1, 1, threat - lag(threat))) %>% 
      filter(abs(flag) == 1 & !duplicated(flag)) %>%
      summarise(timediff = ifelse(length(seconds) == 1, NA, diff(seconds)))
    
    # A tibble: 2 x 2
      Trial_Nr timediff
         <int>  <dbl>
    1        1 0.1   
    2        2 0.0800
    

    Data:

    df <- structure(list(Trial_Nr = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 2L, 2L, 2L, 2L, 2L), seconds = c(689.76, 689.78, 689.8, 689.82, 
    689.84, 689.86, 689.88, 689.9, 689.92, 689.94, 689.96, 689.98, 
    690, 690.02, 690.04), threat = c(0L, 0L, 1L, 1L, 1L, NA, NA, 
    0L, 1L, 0L, 1L, NA, NA, 1L, 0L)), class = "data.frame", row.names = c(NA, 
    -15L))