Search code examples
rtidyversedistinctgroup

Direct way for 'distinct() groupwise'


I would like to make the same like distinct() but for groups. Here is an example:

data <- data.frame(
  group = c(1, 1, 2, 3, 3, 4, 4, 5, 5),
  procedure = c("A", "B", "A", "A", "B", "A", "X", "A", "X")
)

  group procedure
1     1         A
2     1         B
3     2         A
4     3         A
5     3         B
6     4         A
7     4         X
8     5         A
9     5         X

I am expecting this:

Note: group_id is just an interim and not important:

 group procedure group_id
  <dbl> <chr>              <int>
1     1 A                      2
2     1 B                      2
3     2 A                      1
4     4 A                      3
5     4 X                      3

I use this working code:

library(dplyr)
library(tidyr)

data %>%
  summarise(procedure = toString(sort(procedure)), .by = group) %>%
  mutate(group_id = as.integer(factor(procedure))) %>% 
  distinct(group_id, .keep_all = TRUE) %>% 
  separate_rows(procedure)

Is there a more direct method available? For context, my dataset contains 23,000 rows with numerous groups, and I need to identify and evaluate the main member of each group. Therefore, I'm looking for a way to efficiently distinguish and assess all unique groups. Could you suggest an approach to facilitate this evaluation?


Solution

  • I don't know if the code is short enough for you

    data %>%
        summarise(procedure = list(sort(procedure)), .by = group) %>%
        filter(!duplicated(procedure)) %>%
        unnest(procedure)
    

    which gives

    # A tibble: 5 × 2
      group procedure
      <dbl> <chr>
    1     1 A
    2     1 B
    3     2 A
    4     4 A
    5     4 X