Search code examples
rdataframefilterconditional-statementsdata-cleaning

Delete all remaining observations from data frame when exceeding a specific value


I want to delete some observations from my data frame. For each individual (indicated by ID in the following), I have more than one observation and therefore more than one row in the data frame.

The idea is to delete all remaining observations for one individual, when they exceed a specific (pre-defined) value of time.

Here is some example data to explain the problem:

ID <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,4,4,4,4,4,5,5,5,5,5,5,5,6,6,6,7,7)
time <- c(20,23,11,14,15,31, 14, 21, 7, 11, 45, 21, 35, 2, 90, 36, 18, 3, 35, 10, 5,19, 27, 12, 51,91,4,13,9,12,10,7,8,15,6,16)

sim_dat <- data.frame(ID, time)

Now imagine, my specific time-limit would be 30. Then I would like to delete all remaining observations for one individual, when it first exceeded the limit (and important: It doesn't matter, if the deleted values where less or greater than 30).

To be more precise: In the simulated data frame, ID 1 has 8 observations and exceeded the value 30 at his 6th observation. Now, the observations 6,7,8 of ID 1 must be deleted.

For ID 3 no observations should be in the final data frame, as ID 3 exceeded the value 30 within the 1st observation. On the other hand, ID 4 doesn't exceed the value 30, therefore no observations should be deleted.

I hope the problem is clear and I would appreciate any kind of feedback :)

Thank you


Solution

  • Update: Richie's comment-answer is brilliant:

    library(dplyr)
    
    sim_dat |> 
      mutate(flag = cumsum(time > 30), .by = ID) |> 
      filter(flag == 0) |> 
      select(-flag)
    

    Output:

       ID time
    1   1   20
    2   1   23
    3   1   11
    4   1   14
    5   1   15
    6   2    7
    7   2   11
    8   4   10
    9   4    5
    10  4   19
    11  4   27
    12  4   12
    13  6    7
    14  6    8
    15  6   15
    16  7    6
    17  7   16