Search code examples
rdplyrgroup-byduplicates

Finding rows with the same values in another column and concatenating them in a separate column


I have a dataset that includes IDs of different products with an assigned factory number to them. My goal is to create a column to track what products have the same assigned number. This is a summary of the data.

df <- data.frame(id=c(a,b,c,d,e,f,g,a,b,d),
           number=c("178","321","178","452","984","321","540","178","321","452"))
df %>% glimpse()
Rows: 8
Columns: 2
$ id     <dbl> 1, 2, 3, 4, 5, 6, 1, 2
$ number <chr> "178", "321", "178", "452", "984"~

What I would like to have in the outcome is :

ID      shared product     what products
 a            1                a,c
 b            1                b,f
 c            1                a,c
 d            0                 
 e            0
 f            1                b,f
 g            0

shared product is a binary column that indicates if that item has a same number with any of the other products. And the what products column indicates what those products are.

I hope it makes sense to you!

Thank you so much in advance!

Regards,


Solution

  • Using group_by + mutate you could do:

    library(dplyr, warn = FALSE)
    
    df |>
      group_by(number) |>
      mutate(
        shared_product = +(length(id) > 1),
        what_products = if_else(shared_product > 0,
          paste(unique(id), collapse = ","),
          NA_character_
        )
      ) |>
      ungroup()
    #> # A tibble: 9 × 4
    #>   id    number shared_product what_products
    #>   <chr> <chr>           <int> <chr>        
    #> 1 a     178                 1 a,c          
    #> 2 b     321                 1 b,f          
    #> 3 c     178                 1 a,c          
    #> 4 d     452                 0 <NA>         
    #> 5 e     984                 0 <NA>         
    #> 6 f     321                 1 b,f          
    #> 7 g     540                 0 <NA>         
    #> 8 a     178                 1 a,c          
    #> 9 b     321                 1 b,f