In Matlab, I have a cell array of cell arrays created using textscan
on a tab-separated data file. Some columns of the data set are strings. Here is an example array, 'data'
:
data{1,1} = {1; 2; 3; 4; 5};
data{1,2} = {11; 12; 13; 14; 15};
data{1,3} = {'Ringo'; 'Paul'; 'The Beatles'; 'John'; 'George'};
I want to convert the string columns to specific numerical stand-in values (i.e. 'The Beatles' = 1, 'John' = 2, 'Paul' = 3, 'George' = 4, 'Ringo' = 5). Also, it may be important to note that some cells contain more than one "word".
For some context, the entire data set will ultimately be converted from a cell array into one neatly packaged double array (data = cell2mat(data)
) for easier manipulation.
The only solution I know is to loop through the string array and use a switch statement:
vec_0 = data{1,3};
for ii = 1:length(vec_0)
switch vec_0{ii}
case 'The Beatles'
vec_f{ii,1} = 1;
case 'John'
vec_f{ii,1} = 2;
case 'Paul'
vec_f{ii,1} = 3;
case 'George'
vec_f{ii,1} = 4;
case 'Ringo'
vec_f{ii,1} = 5;
end
end
% Replace string column with numerical stand-ins
data{1,3} = vec_f;
% Convert cell array to double array
data = cell2mat(data);
Now we have:
data = [1 11 5; 2 12 3; 3 13 1; 4 14 2; 5 15 4];
What is a more optimal way to do this?
Let's say you have a mapping of strings to numeric values as defined in the following 5-by-2 cell array (one mapping per row):
numMap = {'The Beatles', 1; 'John' , 2; 'Paul' , 3; 'George' , 4; 'Ringo' , 5};
Then you can use ismember
to convert the strings to their mapped numeric values and save it back in data
like so:
[~, index] = ismember(data{1, 3}, numMap(:, 1));
data{1, 3} = numMap(index, 2);
You can generate an array of numerical stand-ins with unique
, convert that to a cell array with num2cell
, and save it back in data
like so:
[uniqueStrings, ~, numIndex] = unique(data{1, 3}, 'stable');
data{1, 3} = num2cell(numIndex);
And uniqueStrings
will contain the unique multi-word strings from data{1, 3}
.
Once you've chosen one of the above options, you can then convert your sample data to a 5-by-3 double array like so:
data = cell2mat([data{:}]);
data =
1 11 5
2 12 3
3 13 1
4 14 2
5 15 4