Search code examples
matlabmatrixunique

How to find number of unique elements in each row of a matrix?


I'm trying to implement some sort of interpolation algorithm. I is an N*4 matrix, holding indices of surrounding points of N other points. But elements in each row of I may not be unique, meaning two or more of them may refer to an individual point. I want to know how many unique indices are available in each row, and I want to do it as fast as possible since N is big!


Solution

  • Well, Mohsen's answer is a general solution for this problem, but arrayfun was too slow for me. So I thought a little more about it and found a much faster solution. I compare all pairs of columns and increase a counter if they were equal:

    tic;
    S = zeros(N, 1, 'uint32');
    Nu = S+4; % in my case most of point are surrounded by four different points
    for i=1:3
        for j=(i+1):4
            S = S + uint32(I(:, i)==I(:, j));
        end
    end
    % Nu(S==0) = 4;
    Nu(S==1) = 3;
    Nu((S==2)|(S==3)) = 2; % why? :)
    Nu(S==6) = 1;
    toc;
    

    For N=189225, arrayfun takes 14.73s on my PC but summation takes only 0.04s.

    Edit: Take care of different numbers of columns

    Here's a modification of the code above. Now we can also have the places of unique values in each row! This one hasn't the :) problem and can be used for higher numbers of columns. Still taking 0.04s on my PC for 189225 rows.

    tic;
    uniq = true(N, 4);
    for i=1:3
        for j=(i+1):4
              uniq(I(:, i)==I(:, j), j) = false;
        end
    end
    Nu = sum(uniq, 2);
    toc;
    

    Edit(2): Comparison with EBH's answer

    After a while I needed this for another problem where I wanted number of unique elements in each row of matrices with different numbers of columns. So I compared my code with EBH's to see if their code is faster. I ran both codes on matrices with rows from 10K to 100K, and columns from 6 to 60. The results are average of spent time (in seconds) of 3 different runs:

    enter image description here

    I'm testing this in 2016a and there has been a significant improvement in performance of for-loops in latest versions of MATLAB. So you may need to to compare it yourself if you want to run it in older versions.