Search code examples
matlaboptimizationtypescell-array

Fastest way to get class types of elements of a cell array


I have a (large) cell array, with various data types. For example,

 myCell = { 1, 2, 3, 'test',  1 , 'abc';
            4, 5, 6, 'foob', 'a', 'def' };

This can include more obscure types like java.awt.Color objects.

I want to ensure that the data in each column is of the same type, since I want to perform table-like operations on it. However, this process seems very slow!

My current method is to use cellfun to get the classes, and strcmp to check them

% Get class of every cell element
types = cellfun( @class, myCell, 'uni', false );
% Check that they are consistent for each column
typesOK = all( strcmp(repmat(types(1,:), size(types,1), 1), types), 1 );
% Output the types (mixed type columns can be handled using typesOK)
types = types(1, :);

% Output for the above example: 
% >> typesOK = [1 1 1 1 0 1]
% >> types = {'double', 'double', 'double', 'char', 'double', 'char'}

I had thought to use cell2table, since it does type checking for the same reason. However, it doesn't give me the desired result (which columns are which types, strictly).

Is there a quicker way to check type consistency within a cell array's columns?


Edit: I've just done some profiling...

It appears the types = cellfun( @class, ...) line takes over 90% of the processing time. If your method is only subtly different to mine, it should be that line which changes, the strcmp is pretty quick.


Edit: I was fortunate to have many suggestions for this problem, and I have compiled them all into a benchmarking answer for performance tests.


Solution

  • To be tested if it can be faster for very large arrays but maybe something like this:

    function [b] = IsTypeConsistentColumns(myCell)
    %[
        b = true;
        try
            for ci = 1:size(myCell, 2)
               cell2mat(myCell(:, ci));
            end
        catch err
            if (strcmpi(err.identifier, 'MATLAB:cell2mat:MixedDataTypes'))
                b = false;
            else
                rethrow(err);
            end
        end
    %]
    end
    

    It depends on how fast cell2mat is compared to your string comparison (even is result of cell2mat is not used here.

    Note that cell2mat will throw an error if type is not consistent (identifier: 'MATLAB:cell2mat:MixedDataTypes', message = 'All contents of the input cell array must be of the same data type.')

    EDIT: limiting to cellfun('isclass', c , cellclass) test

    Here only using type consistence check that is internally performed in cell2mat routine:

    function [consistences, types] = IsTypeConsistentColumns(myCell)
    %[
        ncols = size(myCell, 2);
        consistences = false(1, ncols);
        types = cell(1, ncols);
        for ci = 1:ncols
            cellclass = class(myCell{1, ci});
            ciscellclass = cellfun('isclass', myCell(:, ci), cellclass);
    
            consistences(ci) = all(ciscellclass);
            types{ci} = cellclass; 
        end    
    %]
    end
    

    With you test case myCell = repmat( { 1, 2, 3, 'test', 1 , 'abc'; 4, 5, 6, 'foob', 'a', 'def' }, 10000, 5 );,

    It takes about 0.0123 seconds on my computer with R2015b ... It could even be faster if you want to fail on first non consistent column (here I'm testing them all)