Search code examples
rdplyrcounterrle

How can I count the value of previous rows until criterion is reached?


I have a dataset, where each row indicates one correct/wrong interaction of a participant. I would like to count the number of wrong interactions, until the participant has logged a correct answer twice.

My dataframe looks like this:

id = c(1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3)
accuracy = c(0,1,0,0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,1,1,1)
timestamp = c(2405.078,2409.575,2414.239,2419.084,2424.138,2428.510,805.5845,812.2674,817.6420,822.5424,828.0416,832.9703,842.2013,456.9943,463.0222,469.0649,475.2177,480.3976,486.9402,491.5632,497.0068)

df <-data.frame(id, accuracy, timestamp)

I was thinking of using the rle function, but I am not sure how to add the condition. I would need to create a new variable, the gives me the trial count until condition is met.

It should look something like this:

ID trial_count
1 5
2 5
3 4

Ideally I would then add another column that calculates the time spent until condition is reached with the timestamps. Thanks for the help!


Solution

  • Another approach using cumsum within each group. This will use slice to keep rows until sum of 2 for accuracy is reached. Then in reframe will count up the number of rows and calculate time difference for each id. The result will vary if less than 2 accuracy (unclear what output should be in this case, if it is possible). Also, to count wrong answers until 2 correct is reached, can subtract 2 from number of rows.

    library(tidyverse)
    
    df |>
      slice(
        seq_len(which.max(cumsum(accuracy) >= 2)),
        .by = id
      ) |>
      reframe(
        trial_count = n(),
        time_spent = last(timestamp) - first(timestamp),
        .by = id
      )
    

    Or with older version of tidyverse packages:

    df %>%
      group_by(id) %>%
      slice(seq_len(which.max(cumsum(accuracy) >= 2))) %>%
      summarise(
        trial_count = n(),
        time_spent = last(timestamp) - first(timestamp)
      )
    

    Output

      id trial_count time_spent
    1  1           5    19.0600
    2  2           5    22.4571
    3  3           7    34.5689