Search code examples
rtidyversedummy-variable

Create numerically encoded dummy variables efficiently in R?


How can we transform data of the form

df <- structure(list(customer_number = c(3, 3, 1, 1, 3), 
                     item = c("milkshake","burger", "apple", "burger", "water")
                       ), 
                row.names = c(NA, -5L), class = "data.frame")


#   customer_number      item
# 1               3 milkshake
# 2               3    burger
# 3               1     apple
# 4               1    burger
# 5               3     water

into numerically encoded dummy variables, like this


data.frame(customer_number=c(1,3),
           item_milkshake=c(0,1),
           item_burger=c(1,1),
           item_apple=c(1,0),
           item_water=c(0,1))

#   customer_number item_milkshake item_burger item_apple item_water
# 1               1              0           1          1          0
# 2               3              1           1          0          1

Solution

  • We can create a dummy column with value as 1 and get the data in wide format.

    library(dplyr)
    
    df %>%
      mutate(n = 1) %>%
      arrange(customer_number) %>%
      tidyr::pivot_wider(names_from = item, values_from = n,
                         values_fill = list(n = 0), names_prefix = "item_")
    
    # A tibble: 2 x 5
    #  customer_number item_apple item_burger item_milkshake item_water
    #            <dbl>      <dbl>       <dbl>          <dbl>      <dbl>
    #1               1          1           1              0          0
    #2               3          0           1              1          1