I have a dataset, where each row indicates one correct/wrong interaction of a participant. I would like to count the number of wrong interactions, until the participant has logged a correct answer twice.
My dataframe looks like this:
id = c(1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3)
accuracy = c(0,1,0,0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,1,1,1)
timestamp = c(2405.078,2409.575,2414.239,2419.084,2424.138,2428.510,805.5845,812.2674,817.6420,822.5424,828.0416,832.9703,842.2013,456.9943,463.0222,469.0649,475.2177,480.3976,486.9402,491.5632,497.0068)
df <-data.frame(id, accuracy, timestamp)
I was thinking of using the rle function, but I am not sure how to add the condition. I would need to create a new variable, the gives me the trial count until condition is met.
It should look something like this:
ID | trial_count |
---|---|
1 | 5 |
2 | 5 |
3 | 4 |
Ideally I would then add another column that calculates the time spent until condition is reached with the timestamps. Thanks for the help!
Another approach using cumsum
within each group. This will use slice
to keep rows until sum of 2 for accuracy
is reached. Then in reframe
will count up the number of rows and calculate time difference for each id
. The result will vary if less than 2 accuracy
(unclear what output should be in this case, if it is possible). Also, to count wrong answers until 2 correct is reached, can subtract 2 from number of rows.
library(tidyverse)
df |>
slice(
seq_len(which.max(cumsum(accuracy) >= 2)),
.by = id
) |>
reframe(
trial_count = n(),
time_spent = last(timestamp) - first(timestamp),
.by = id
)
Or with older version of tidyverse packages:
df %>%
group_by(id) %>%
slice(seq_len(which.max(cumsum(accuracy) >= 2))) %>%
summarise(
trial_count = n(),
time_spent = last(timestamp) - first(timestamp)
)
Output
id trial_count time_spent
1 1 5 19.0600
2 2 5 22.4571
3 3 7 34.5689