Search code examples
rtidyverseaggregate

Determine time of event in R


Wondering what the most effective way to determine event time in R is. Other answers seemed to not have grouping by ID, and that's a big chunk of what I need.

My data looks something like this:

time = rep(c(1:5),2)
id = c(rep(1,5),rep(2,5))
event = c(0,0,0,1,1,0,1,1,1,1)

df = data.frame(cbind(time,id,event))
df
   time id event
1     1  1     0
2     2  1     0
3     3  1     0
4     4  1     1
5     5  1     1
6     1  2     0
7     2  2     1
8     3  2     1
9     4  2     1
10    5  2     1
>

Where "event" is binary observation (death of individual in experiment), and (should) remain 1 once it is first observed.

I need to determine the first time of event == 1, for each id number, and generate a vector of the id number and time first observed where event =1.

I originally was going to sloppily subset the data where event =1, and then just pick the minimum value of the week for each id, but that gets even sloppier when grouping by ID. Then I tried some aggregating but also struggled to do it across the ID grouping. I know tidyverse has some options but I'm a n00b.

I'm sure there's a very straightforward way to do this. Thanks!


Solution

  • The tidyverse is indeed super helpful for this kind of stuff.

    df %>%
      filter(event == 1) %>%
      group_by(id) %>%
      arrange(id, time) %>%
      summarise(time = first(time)) %>% 
      ungroup()
    

    # A tibble: 2 x 2
         id  time
      <dbl> <dbl>
    1     1     4
    2     2     2