Search code examples
matlabcomparedetectionstring-comparison

How to compare text strings in a table colum matlab


If I have an N-by-1 table colum how is it possible to detect if any of the rows is identical?


Solution

  • If you simply want to determine if there are duplicate rows, you can use unique to do this. You can check the number of unique values in the column and compare this to the total number of elements (numel) in the same column

    tf = unique(t.Column) == numel(t.Column)
    

    If you want to determine which rows are duplicates, you can again use unique but use the third output and then use accumarray to count the number of occurrences of each value and then select those values which appear more than once.

    [vals, ~, inds] = unique(t.Column, 'stable'); 
    repeats = vals(accumarray(inds, 1) > 1);
    
    % And to print them out:
    fprintf('Duplicate value: %s\n', repeats{:})
    

    If you want a logical vector of true/false for where the duplicates exist you can do something similar to that above

    [vals, ~, inds] = unique(t.Column, 'stable');
    result = ismember(inds, find(accumarray(inds, 1) > 1));
    

    Or

    [vals, ~, inds] = unique(t.Column, 'stable');
    result = sum(bsxfun(@eq, inds, inds.'), 2) > 1;
    

    Update

    You can combine the two approaches above to accomplish what you want.

    [vals, ~, inds] = unique(t.Column, 'stable'); 
    repeats = vals(accumarray(inds, 1) > 1);
    
    hasDupes = numel(repeats) > 0;
    
    if hasDupes
        for k = 1:numel(repeats)
            fprintf('Duplicate value: %s\n', repeats{k});
            fprintf('   Found at: ');
            fprintf('%d ', find(strcmp(repeats{k}, t.Column)));
            fprintf('\n');
        end
    end