Search code examples
rlistone-hot-encoding

Encode numbers into categorical vectors


I have an vector of integers y <- c(1, 2, 3, 3) and now I want to convert it into an list like this (one hot encoded):

1 0 0 
0 1 0
0 0 1
0 0 1

I tried to find a solution with to_categorical but I had problems with data types... Do anyone know a smart and smooth solution for this task?

This is my try:

 for (i in 1:length(y)) {
  one_character <- list(as.vector(to_categorical(y[[i]], num_classes = 3)))
  list_test <- rbind(list_test, one_character)
  }

but I get the following error:

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  IndexError: index 3 is out of bounds for axis 1 with size 3

Solution

  • Here is one way in base R. Create a matrix of 0s and assign 1 based on the sequence of rows and y value as column index

    m1 <- matrix(0, length(y), max(y))
    m1[cbind(seq_along(y), y)] <- 1
    m1
    #      [,1] [,2] [,3]
    #[1,]    1    0    0
    #[2,]    0    1    0
    #[3,]    0    0    1
    #[4,]    0    0    1
    

    In base R, we can also do

    table(seq_along(y), y)
    #  y
    #    1 2 3
    #  1 1 0 0
    #  2 0 1 0
    #  3 0 0 1
    #  4 0 0 1
    

    Or another option is model.frame from base R

    model.matrix(~factor(y) - 1)