Search code examples
rtidymodelsr-recipes

R tidymodels recipes step - from a column with multiple values, create a new column for each of these values (one hot encoding)


Say i have this dataframe:

library(tidyverse)
# Sample data frame
df <- data.frame(
  id = 1:3,
  fruits = c("apple | oranges", "apple | bananas", "bananas | oranges")
)
df
id fruits
1 apple | oranges
2 apple | bananas
3 bananas | oranges

I want to separate the values from the fruits column and then perform one hot encoding for each one as follows:

# Step 1: Separate the values based on |
df_separated <- df %>%
  separate_rows(fruits, sep = " \\| ")

# Step 2: Create a dummy variable for each element
df_dummy <- df_separated %>%
  mutate(value = TRUE) %>%
  spread(fruits, value, fill = FALSE)

# View the result
print(df_dummy)
id apple bananas oranges
1 TRUE FALSE TRUE
2 TRUE TRUE FALSE
3 FALSE TRUE TRUE

However, I cannot manage to convert this code as a recipe step to incorporate it into a tidymodels workflow. Any ideas how to do so?


Solution

  • library(tidymodels)
    
    dummies_fruit <- recipe(~ fruits, data = df) |>
      step_dummy_extract(fruits, sep = " | ") |>
      prep()