spliting a concatenated text column into binary columns in R (dplyr) when a list of all categories is not known

I have seen similar questions to mine but I could not find a good solution to mine yet in tidyr and dplyr . I have a concatenated column of different text categories where I don't know a complete list of them. It is big data and I could not identify all categories in this column. I need to split them all for each ID and create a corresponding binary column indicating if that category is lined with the ID . this is a sample of the data

df <- data.frame(id=c(1,2,3,4,5),
                   number=c("a,b,d","e,a","c","","k,t"))
df %>% glimpse()

Rows: 5
Columns: 2
$ id     <dbl> 1, 2, 3, 4, 5
$ number <chr> "a,b,d", "e,a", "c", "", "k,t"

my desired data will be like

id a b C d e k t
1  1 1 0 1 0 0 0
2  1 0 1 0 1 0 0
3  0 0 1 0 0 0 0 
4  0 0 0 0 0 0 0 
5  0 0 0 0 0 1 1

Thank you in advance and I hope it is clear.

Solution

This approach uses separate_longer_delim from tidyr to split your text categories and then borrows a convenient dummy encoding function from recipes for one-hot encoding.

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
library(dplyr)
library(tidyr)

df <- data.frame(id=c(1,2,3,4,5),
                 number=c("a,b,d","e,a","c","","k,t"))

df %>%
  separate_longer_delim(number, delim = ',') %>%
  recipe(~.) %>%
  step_dummy(number, one_hot = TRUE) %>%
  prep() %>%
  bake(new_data = NULL) %>%
  group_by(id) %>%
  summarize(across(everything(), max))
#> # A tibble: 5 × 9
#>      id number_X number_a number_b number_c number_d number_e number_k number_t
#>   <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#> 1     1        0        1        1        0        1        0        0        0
#> 2     2        0        1        0        0        0        1        0        0
#> 3     3        0        0        0        1        0        0        0        0
#> 4     4        1        0        0        0        0        0        0        0
#> 5     5        0        0        0        0        0        0        1        1

^{Created on 2023-03-16 with reprex v2.0.2}