Search code examples
rdataframegroup-bytidyversedata-wrangling

How to select group based on the order of rows within group in R


For example, I have the following dataframe:

ID variable order
1 a 1
1 b 2
2 b 1
2 a 2
2 b 3
3 b 1
3 a 2

I would like to keep only the ID groups where "a" appears before "b" (i.e., the "order" of a is smaller than b). So the result would look something like this:

ID variable order
1 a 1
1 b 2
2 b 1
2 a 2
2 b 3

Where only ID 1 and 2 remain (with all of its original rows), and all rows in ID 3 are removed because the "order" of b is smaller than a. Would anyone have guidance on how this could be done in R?


Solution

  • your_data %>%
      group_by(ID) %>%
      filter(any(variable == "a" & lead(variable, default = "not b") == "b"))
    

    This ignores the order column and is based on the order of rows within each ID group. It check for the presence of an "a" on one row AND a "b" on the immediate next row.

    In your comment you say '"a" right before "b"' - I went with that "right before" clarification. If a group has values "a", "c", "b" it would not be kept in my answer as the "a" is not "right before" the "b".