Search code examples
rdataframedplyrunique

Keep only unique IDs in data.frame (remove all IDs with more than one observation)


I want to subset the data.frame below in a way that only IDs are kept, which don't show up multiple times:

data <- data.frame(Product=c('A', 'B', 'B', 'C'),
                   Likeability=c(80, 80, 82, 70),
                   Score=c(31, 33, 33, 33),
                   Quality=c(16, 32, 56, 18))

Should turn into:

data
    Product Likeability Score Quality
1       A          80    31      16
2       C          70    33      18

If I use commands like unique() or distinct() or duplicated() it would usually keep one of the two observations of product B. I would like to find a way how only the unique values are kept which I can apply to a large data.frame. Preferably with a dplyr solution but also open to other ideas.


Solution

  • You can try group + filter with condition n()==1 like below

    data %>%
      group_by(Product) %>%
      filter(n() == 1) %>%
      ungroup()
    

    which gives

      Product Likeability Score Quality
      <chr>         <dbl> <dbl>   <dbl>
    1 A                80    31      16
    2 C                70    33      18