Search code examples
arraysmatlabstatisticsdata-analysis

Matlab One Hot Encoding - convert column with categoricals into several columns of logicals


CONTEXT

I have a large number of columns with categoricals, all with different, unrankable choices. To make my life easier for analysis, I'd like to take each of them and convert it to several columns with logicals. For example:

1   GENRE
2   Pop
3   Classical
4   Jazz

...would turn into...

1   Pop Classical Jazz
2   1       0      0
3   0       1      0
4   0       0      1

PROBLEM

I've tried using ind2vec but this only works with numericals or logicals. I've also come across this but am not sure it works with categoricals. What is the right function to use in this case?


Solution

  • If you want to convert from a categorical vector to a logical array, you can use the unique function to generate column indices, then perform your encoding using any of the options from this related question:

    % Sample data:
    data = categorical({'Pop'; 'Classical'; 'Jazz'; 'Pop'; 'Pop'; 'Jazz'});
    
    % Get unique categories and create indices:
    [genre, ~, index] = unique(data)
    
    genre = 
    
         Classical 
         Jazz 
         Pop 
    
    
    index =
    
         3
         1
         2
         3
         3
         2
    
    % Create logical matrix:
    mat = logical(accumarray([(1:numel(index)).' index], 1))
    
    mat =
    
      6×3 logical array
    
       0   0   1
       1   0   0
       0   1   0
       0   0   1
       0   0   1
       0   1   0