I've got survival analysis data, but unfortunately the event itself isn't death. (Well, fortunately for the people in the dataset).
This means someone may remain in the dataset for longer than their event. I've figured out, thanks to a prior StackExchange question, how to create a column that returns TRUE
for the first occurrence of the event.
But now I want to drop all the future occurrences of the event—that is, I want to right-censor.
As an example, this code
mtcars %>%
select(cyl, carb) %>%
group_by(cyl) %>%
mutate(first_4 = carb == 4 & !duplicated(carb == 4)) %>%
arrange(cyl)
gives me
cyl carb first_4
1 4 1 FALSE
2 4 2 FALSE
3 4 2 FALSE
4 4 1 FALSE
5 4 2 FALSE
6 4 1 FALSE
7 4 1 FALSE
8 4 1 FALSE
9 4 2 FALSE
10 4 2 FALSE
11 4 2 FALSE
12 6 4 TRUE
13 6 4 FALSE
14 6 1 FALSE
15 6 1 FALSE
16 6 4 FALSE
17 6 4 FALSE
18 6 6 FALSE
19 8 2 FALSE
20 8 4 TRUE
21 8 3 FALSE
22 8 3 FALSE
23 8 3 FALSE
24 8 4 FALSE
25 8 4 FALSE
26 8 4 FALSE
27 8 2 FALSE
28 8 2 FALSE
29 8 4 FALSE
30 8 2 FALSE
31 8 4 FALSE
32 8 8 FALSE
So far, so good. What I'd like to do, however, is keep all the rows before the TRUE and delete all the rows after it, per group, IFF TRUE
shows up in that group at all. So, my final dataset would look like this:
cyl carb first_4
1 4 1 FALSE
2 4 2 FALSE
3 4 2 FALSE
4 4 1 FALSE
5 4 2 FALSE
6 4 1 FALSE
7 4 1 FALSE
8 4 1 FALSE
9 4 2 FALSE
10 4 2 FALSE
11 4 2 FALSE
12 6 4 TRUE
13 8 2 FALSE
14 8 4 TRUE
We can add a filter
at the end
library(dplyr)
mtcars %>%
select(cyl, carb) %>%
group_by(cyl) %>%
mutate(first_4 = carb == 4 & !duplicated(carb == 4)) %>%
arrange(cyl) %>%
filter(cumsum(cumsum(first_4)) < 2)
# A tibble: 14 x 3
# Groups: cyl [3]
# cyl carb first_4
# <dbl> <dbl> <lgl>
# 1 4 1 FALSE
# 2 4 2 FALSE
# 3 4 2 FALSE
# 4 4 1 FALSE
# 5 4 2 FALSE
# 6 4 1 FALSE
# 7 4 1 FALSE
# 8 4 1 FALSE
# 9 4 2 FALSE
#10 4 2 FALSE
#11 4 2 FALSE
#12 6 4 TRUE
#13 8 2 FALSE
#14 8 4 TRUE
Or another option is slice
%>%
slice(if(!any(first_4)) row_number() else seq_len(which.max(first_4)))