Search code examples
rsortingmean

in R how do I easily find the first x instances of a varibale in a table, ideally using dplyr?


I have the following dataframe

ID <-  c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,2, 3,3,3,3,3,3, 3,3,3,3,3,3)
reversal = c('R0', 'R0', 'R0', 'R0', 'R1', 'R1', 'R1', 
             'R0', 'R0', 'R0', 'R0', 'R1', 'R1', 'R1', 'R1',
             'R0', 'R0', 'R0', 'R0', 'R1', 'R1', 'R1', 'R1', 'R2', 'R2', 'R2', 'R2')
event <- c(0, 0, 0, 0, 1, 0, 1, 
           0, 0, 0, 0, 1, 1, 1, 0,
           0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1) 

RT = runif(27)
df <- data.frame(ID, reversal, event, RT)

and I would like to identify/label for each ID and for each reversal the first 4 instances of 0 and the first 4 instances of 1 in event, so that I could, for example, calculate the mean RT for those first 4 instances of 0 and 1. Thank you!


Solution

  • You can achieve the same result by putting the logic directly into the filter statement, no need to mutate or create intermediate objects:

    df %>%
      filter(row_number() <= 4, .by = c(ID, reversal, event)) %>%
      summarise(mean_RT = mean(RT, na.rm = TRUE), .by = c(ID, reversal, event))
    

    Or in older versions of dplyr

    df %>%
      group_by(ID, reversal, event) %>%
      filter(row_number() <= 4) %>%
      summarise(mean_RT = mean(RT, na.rm = TRUE)) %>%
      ungroup()
    

    Both will give you the same output as your mean_values:

       ID reversal event    mean_RT
    1   1       R0     0 0.53447121
    2   1       R1     1 0.64449169
    3   1       R1     0 0.91095299
    4   2       R0     0 0.78950270
    5   2       R1     1 0.33560007
    6   2       R1     0 0.89449568
    7   3       R0     0 0.70057488
    8   3       R1     1 0.31019288
    9   3       R1     0 0.49802194
    10  3       R2     1 0.05643762
    11  3       R2     0 0.07677300