Search code examples
rtidyversesurvival-analysis

Filtering Time to Event Data in Tidyverse


I have some time to event data that I'm working with. I'd like to filter the data from the first time the subject is in the study to the first observed event (not worried about the recurrent events that happened after the first event -- only want to explore time to first event).

I'm using a between within a filter function, which has always worked for me in the past but has issues here because there are some subjects that never have the event and thus I get an error that states Error: Expecting a single value: [extent=0].

I think what I want is a method of filtering the data to subject between start of entrance to the study to time to first event OR if there is no event subject all data for the subject.

Here is an example of what my data looks like:

## data
subject <- c("A", "A", "A", "A", "B", "B", "C", "C", "C", "D", "E", "E", "E", "E", "E", "F", "F", "F", "F", "F")
event <- c(0,0,1,0,0,0,0,0,1,0,0,1,0,1,1,0,0,0,0,0)

df <- data.frame(subject, event)

## create index to count the days the subject is in the study
library(tidyverse)

df <- df %>%
    group_by(subject) %>%
    mutate(ID = seq_along(subject))

df

# A tibble: 20 x 3
# Groups:   subject [6]
   subject event    ID
   <fct>   <dbl> <int>
 1 A           0     1
 2 A           0     2
 3 A           1     3
 4 A           0     4
 5 B           0     1
 6 B           0     2
 7 C           0     1
 8 C           0     2
 9 C           1     3
10 D           0     1
11 E           0     1
12 E           1     2
13 E           0     3
14 E           1     4
15 E           1     5
16 F           0     1
17 F           0     2
18 F           0     3
19 F           0     4
20 F           0     5

## filter event times between the start of the trial and when the subject has the event for the first time

df %>%
    group_by(subject) %>%
    filter(., between(row_number(), 
        left = which(ID == 1),
        right = which(event == 1)))

The last part is where my error is occurring.


Solution

  • Is this what you're after?

    df2 <- df %>%
      group_by(subject) %>%
      filter(cumsum(event) == 0 | (cumsum(event) == 1 & event == 1))
    

    Result:

    # A tibble: 16 x 2
    # Groups:   subject [6]
       subject event
       <fct>   <dbl>
     1 A           0
     2 A           0
     3 A           1
     4 B           0
     5 B           0
     6 C           0
     7 C           0
     8 C           1
     9 D           0
    10 E           0
    11 E           1
    12 F           0
    13 F           0
    14 F           0
    15 F           0
    16 F           0