Search code examples
databasematlabgroupingcell-array

How to make database-like grouping on cell array or struct in Matlab?


I have tabular data in Matlab. It can be stored either in struct or in cell array.

How to perform grouping like done with unique(A,'rows')? It is not working for cell arrays for some reason.

Are there some approaches?

UPDATE

>> A={'Morgan', 'male', 12
'Morgan', 'female', 7
'Dottie', 'female', 5
'Dottie', 'female', 13}
A = 
    'Morgan'    'male'      [12]
    'Morgan'    'female'    [ 7]
    'Dottie'    'female'    [ 5]
    'Dottie'    'female'    [13]
>> A(:,1:2)
ans = 
    'Morgan'    'male'  
    'Morgan'    'female'
    'Dottie'    'female'
    'Dottie'    'female'
>> B=ans;
>> unique(B,'rows')
Warning: The 'rows' input is not supported for cell array inputs. 
> In cell.unique>celluniqueR2012a at 237
  In cell.unique at 149 
ans = 
    'Dottie'
    'Morgan'
    'female'
    'male'

As you see it is not grouping by row, it is grouping by all values in bag.

UPDATE 2

The only approach I am researching is like follows

>> [cell2mat(B(:,1)) repmat(':',size(B,1),1) cell2mat(B(:,2))]
Error using cat
Dimensions of matrices being concatenated are not consistent.
Error in cell2mat (line 84)
            m{n} = cat(1,c{:,n}); 

But it is (1) incredibly complex and (2) does not work yet.

UPDATE 3

I was looking for what is possible with statistics toolbox:

>> A={'Name', 'Gender', 'Age'; 'Ann', false, 20; 'John', true, 25; 'Peter', true, 30; 'Ann', false, 28}
A = 
    'Name'     'Gender'    'Age'
    'Ann'      [     0]    [ 20]
    'John'     [     1]    [ 25]
    'Peter'    [     1]    [ 30]
    'Ann'      [     0]    [ 28]
>> B=cell2dataset(A,'ReadVarNames',true)
B = 
    Name           Gender    Age
    'Ann'          false     20 
    'John'         true      25 
    'Peter'        true      30 
    'Ann'          false     28 
>> grpstats(B,{'Name','Gender'},{'numel'})
ans = 
               Name           Gender    GroupCount    numel_Age
    Ann_0      'Ann'          false     2             2        
    John_1     'John'         true      1             1        
    Peter_1    'Peter'        true      1             1        

Solution

  • If understand your example correctly, the output you want is:

    ans =
        'Morgan'    'male'
        'Morgan'    'female'
        'Dottie'    'female'
    

    unique ignores rows in cell arrays. You can trick it by stitching your columns together. Once you've found the unique rows, you can use that list to get the information out of your cell array.

    % Use A as you defined it
    B = A(:, 1:2);
    C = strcat(B(:,1), ';', B(:,2)); % this gives you 'Morgan;male', etc.
    [D cRow] = unique(C); % Find the unique rows in your new 1-column list
    
    E = B(sort(cRow), :); % Find the values in the cell array given the unique rows
    

    This gives:

    E =
        'Morgan'    'male'
        'Morgan'    'female'
        'Dottie'    'female'
    

    If that isn't the output you were looking for, please update your question.