I want to delete some observations from my data frame. For each individual (indicated by ID in the following), I have more than one observation and therefore more than one row in the data frame.
The idea is to delete all remaining observations for one individual, when they exceed a specific (pre-defined) value of time.
Here is some example data to explain the problem:
ID <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,4,4,4,4,4,5,5,5,5,5,5,5,6,6,6,7,7)
time <- c(20,23,11,14,15,31, 14, 21, 7, 11, 45, 21, 35, 2, 90, 36, 18, 3, 35, 10, 5,19, 27, 12, 51,91,4,13,9,12,10,7,8,15,6,16)
sim_dat <- data.frame(ID, time)
Now imagine, my specific time-limit would be 30. Then I would like to delete all remaining observations for one individual, when it first exceeded the limit (and important: It doesn't matter, if the deleted values where less or greater than 30).
To be more precise: In the simulated data frame, ID 1 has 8 observations and exceeded the value 30 at his 6th observation. Now, the observations 6,7,8 of ID 1 must be deleted.
For ID 3 no observations should be in the final data frame, as ID 3 exceeded the value 30 within the 1st observation. On the other hand, ID 4 doesn't exceed the value 30, therefore no observations should be deleted.
I hope the problem is clear and I would appreciate any kind of feedback :)
Thank you
Update: Richie's comment-answer is brilliant:
library(dplyr)
sim_dat |>
mutate(flag = cumsum(time > 30), .by = ID) |>
filter(flag == 0) |>
select(-flag)
Output:
ID time
1 1 20
2 1 23
3 1 11
4 1 14
5 1 15
6 2 7
7 2 11
8 4 10
9 4 5
10 4 19
11 4 27
12 4 12
13 6 7
14 6 8
15 6 15
16 7 6
17 7 16