I have a dataset that includes IDs
of different products with an assigned factory number to them.
My goal is to create a column to track what products have the same assigned number. This is a summary of the data.
df <- data.frame(id=c(a,b,c,d,e,f,g,a,b,d),
number=c("178","321","178","452","984","321","540","178","321","452"))
df %>% glimpse()
Rows: 8
Columns: 2
$ id <dbl> 1, 2, 3, 4, 5, 6, 1, 2
$ number <chr> "178", "321", "178", "452", "984"~
What I would like to have in the outcome is :
ID shared product what products
a 1 a,c
b 1 b,f
c 1 a,c
d 0
e 0
f 1 b,f
g 0
shared product
is a binary column that indicates if that item has a same number with any of the other products. And the what products
column indicates what those products are.
I hope it makes sense to you!
Thank you so much in advance!
Regards,
Using group_by
+ mutate
you could do:
library(dplyr, warn = FALSE)
df |>
group_by(number) |>
mutate(
shared_product = +(length(id) > 1),
what_products = if_else(shared_product > 0,
paste(unique(id), collapse = ","),
NA_character_
)
) |>
ungroup()
#> # A tibble: 9 × 4
#> id number shared_product what_products
#> <chr> <chr> <int> <chr>
#> 1 a 178 1 a,c
#> 2 b 321 1 b,f
#> 3 c 178 1 a,c
#> 4 d 452 0 <NA>
#> 5 e 984 0 <NA>
#> 6 f 321 1 b,f
#> 7 g 540 0 <NA>
#> 8 a 178 1 a,c
#> 9 b 321 1 b,f