Search code examples
rdataframeconditional-statementssubset

R: Select rows by value and always include previous row


I'm trying to subset a data frame to all rows for which a certain column value is '13', but I want all rows preceding a row with '13' to be included too, no matter the value in that column.

I do not want a row to be included twice when it both precedes a row with '13' in the specific column, but also has the value '13' itself.

Here is an example data frame and solution, whereby the condition (subset rows to rows with time = 13 and (time=13)-1, without duplicating)

ID  speed   dist    time
A   4        12     4
B   7        10     8
C   7        18     13
C   8        4      5
A   5        6      13
D   6        2      13
E   7        2      9

Becomes

ID  speed   dist    time
B   7       10      8
C   7       18      13
C   8       4       5
A   5       6       13
D   6       2       13

Solution

  • Create a position index where 'time' value is 13 using which and then subtract 1 from the index and concatenate both to subset

    i1 <- which(df1$time == 13) 
    ind <- sort(unique(i1 - rep(c(1, 0), each = length(i1))))
    ind <- ind[ind >0]
    df1[ind,]
    

    -output

      ID speed dist time
    2  B     7   10    8
    3  C     7   18   13
    4  C     8    4    5
    5  A     5    6   13
    6  D     6    2   13
    

    data

    df1 <- structure(list(ID = c("A", "B", "C", "C", "A", "D", "E"), speed = c(4L, 
    7L, 7L, 8L, 5L, 6L, 7L), dist = c(12L, 10L, 18L, 4L, 6L, 2L, 
    2L), time = c(4L, 8L, 13L, 5L, 13L, 13L, 9L)), 
    class = "data.frame", row.names = c(NA, 
    -7L))