How can I count the value of previous rows until criterion is reached?

I have a dataset, where each row indicates one correct/wrong interaction of a participant. I would like to count the number of wrong interactions, until the participant has logged a correct answer twice.

My dataframe looks like this:

id = c(1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3)
accuracy = c(0,1,0,0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,1,1,1)
timestamp = c(2405.078,2409.575,2414.239,2419.084,2424.138,2428.510,805.5845,812.2674,817.6420,822.5424,828.0416,832.9703,842.2013,456.9943,463.0222,469.0649,475.2177,480.3976,486.9402,491.5632,497.0068)

df <-data.frame(id, accuracy, timestamp)

I was thinking of using the rle function, but I am not sure how to add the condition. I would need to create a new variable, the gives me the trial count until condition is met.

It should look something like this:

ID	trial_count
1	5
2	5
3	4

Ideally I would then add another column that calculates the time spent until condition is reached with the timestamps. Thanks for the help!

Solution

Another approach using cumsum within each group. This will use slice to keep rows until sum of 2 for accuracy is reached. Then in reframe will count up the number of rows and calculate time difference for each id. The result will vary if less than 2 accuracy (unclear what output should be in this case, if it is possible). Also, to count wrong answers until 2 correct is reached, can subtract 2 from number of rows.

library(tidyverse)

df |>
  slice(
    seq_len(which.max(cumsum(accuracy) >= 2)),
    .by = id
  ) |>
  reframe(
    trial_count = n(),
    time_spent = last(timestamp) - first(timestamp),
    .by = id
  )

Or with older version of tidyverse packages:

df %>%
  group_by(id) %>%
  slice(seq_len(which.max(cumsum(accuracy) >= 2))) %>%
  summarise(
    trial_count = n(),
    time_spent = last(timestamp) - first(timestamp)
  )

Output

  id trial_count time_spent
1  1           5    19.0600
2  2           5    22.4571
3  3           7    34.5689