How to find unique words in each row of a two dimension matrix in Matlab

I have a two dimension huge matrix (A). Each cell of this matrix is either empty or contains a word. I want to find the unique words in each row of this matrix separately and store them in another two dimension matrix (B) such that the Kth row of B contains the unique elements of the Kth row in A. I tried this way but it said that the input must be a cell array:

 % engine

 B = sort(A,2) ;

 d = [true(1,size(B,2)) ; diff(B)>0] ;

 B = mat2cell(B(d).',1,sum(d));

 % check if B{K} contains the unique elements of the Kth row of A

 for i=1:size(A,1),

     tf(i) = isequal(B{i},unique(A(i,:))) ;
 end

 all(tf)

I would appreciate your help to solve this error.

Solution

You almost have it correct. If I understand your question correctly, you want to iterate over each row of your matrix, find unique words, and create a new row of cells in an output cell array that contain these words. Here is an example using a 3 x 5 matrix of cell elements:

A = 'Hi hi hi how are you my my name is Ray Ray Ray StackOverflow StackOverflow';
Acell = reshape(strsplit(A, ' '), 3, 5).'; % // Use for MATLAB R2013a and up
%//Acell = reshape(regexp(A, ' ', 'split'), 3, 5).'; %// Use for MATLAB R2012b and below

Here is what Acell looks like:

Acell = 

'Hi'     'hi'               'hi'           
'how'    'are'              'you'          
'my'     'my'               'name'         
'is'     'Ray'              'Ray'          
'Ray'    'StackOverflow'    'StackOverflow'

Now, let's insert some blank strings in the cell array to mimic your situation

Acell{1,1} = '';
Acell{4,1} = '';

Therefore:

Acell = 

''       'hi'               'hi'           
'how'    'are'              'you'          
'my'     'my'               'name'         
''       'Ray'              'Ray'          
'Ray'    'StackOverflow'    'StackOverflow'

Now, let's initialize the matrix B as a cell array that will store this output:

B = cell(size(Acell), 1);

This will have as many rows as Acell will have. However, what will happen is that each row of B will be uneven. As such, the only way that we can accomplish what you want down is that each element of B will also be a cell array. MATLAB does not support creating matrices that have unequal columns per row. Now, we can simply loop through each row of A, run unique, then assign this to each row of B:

for idx = 1 : size(Acell, 1)
    B{idx} = unique(Acell(idx,:));
end

Now let's see B:

B = 

{1x2 cell}
{1x3 cell}
{1x2 cell}
{1x2 cell}
{1x2 cell}

Let's see each cell by themselves:

for idx = 1 : numel(B)
    disp(B{idx});
end

We thus get:

''    'hi'

'are'    'how'    'you'

'my'    'name'

''    'Ray'

'Ray'    'StackOverflow'

You'll notice that the words are also sorted in alphabetical order. That's how unique orders things. Also note that unique does not differentiate between upper case and lower case letters. As such, Hi and hi would count as different words. If this is not your desired behaviour and you want to filter out words where the case should not matter, convert all of the letters to lower case by using the lower function before doing any processing. You can convert all of your strings to lower by using cellfun

Alower = cellfun(@lower, A, 'UniformOutput', false);

If you compare B with Acell, this gives you the unique words per row.