Search code examples
rdplyrsubset

R subsetting by unique observation and prioritizing a value


I have a coding problem regarding subsetting my dataset. I would like to subset my data with the following conditions (1) one observation per ID and (2) retaining a row for "event" = 1 occurring at any time, while still not losing any observations.

An example dataset looks like this:

 ID event
 A  1
 A  1
 A  0
 A  1
 B  0
 B  0
 B  0
 C  0
 C  1
 

Desired output

 A  1
 B  0
 C  1

I imagine this would be done using dplyr df >%> group_by(ID), but I'm unsure how to prioritize selecting for any row that contains event = 1 without losing when event = 0. I do not want to lose any of the IDs.

Any help would be appreciated - thank you very much.


Solution

  • We may use

    aggregate(event ~ ID, df1, max)
       ID event
    1  A     1
    2  B     0
    3  C     1
    

    Or with dplyr

    library(dplyr)
    df1 %>%
       group_by(ID) %>%
       slice_max(n = 1, event, with_ties = FALSE) %>%
       ungroup
    # A tibble: 3 × 2
      ID    event
      <chr> <int>
    1 A         1
    2 B         0
    3 C         1
    

    data

    df1 <- structure(list(ID = c("A", "A", "A", "A", "B", "B", "B", "C", 
    "C"), event = c(1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L)), 
    class = "data.frame", row.names = c(NA, 
    -9L))