Search code examples
rcategorical-data

Turning vectors of strings in a dataframe into categorical variables in R


I'm fairly new to R and am sure there's a way to do the following without using loops, which I'm more familiar with.

Take the following example where you have a bunch of names and fruits each person likes:

name <- c("Alice", "Bob")
preference <- list(c("apple", "pear"), c("banana", "apple"))
df <- as.data.frame(cbind(name, preference))

How to I convert it to the following?

apple <- c(1, 1)
pear <- c(1, 0)
banana <- c(0, 1)
df2 <- data.frame(name, apple, pear, banana)

My basic instinct is to first extract all the fruits then do a loop to check if each fruit is in each row's preference:

fruits <- unique(unlist(df$preference))
for (fruit in fruits) {
    df <- df %>% rowwise %>% mutate("{fruit}" := fruit %in% preference)
}

This seems to work, but I'm pretty sure there's a better way to do this.


Solution

  • df %>%
      unnest(everything()) %>%
      xtabs(~., .) %>%
      as.data.frame.matrix() %>%
      rownames_to_column('name')
    
       name apple banana pear
    1 Alice     1      0    1
    2   Bob     1      1    0