Search code examples
pythonnumpyvectorizationone-hot-encoding

Transform 2d numpy array into 2d one hot encoding


How would I transform

a=[[0,6],
   [3,7],
   [5,5]]

into

b=[[1,0,0,0,0,0,1,0],
   [0,0,0,1,0,0,0,1],
   [0,0,0,0,0,1,0,0]]

I want to bring notice to how the final array in b only has one value set to 1 due to the repeat in the final array in a.


Solution

  • Using indexing:

    a = np.array([[0,6],
                  [3,7],
                  [5,5]])
    
    b = np.zeros((len(a), a.max()+1), dtype=int)
    
    b[np.arange(len(a)), a.T] = 1
    

    Output:

    array([[1, 0, 0, 0, 0, 0, 1, 0],
           [0, 0, 0, 1, 0, 0, 0, 1],
           [0, 0, 0, 0, 0, 1, 0, 0]])