Search code examples
arraysmatlabrowmatchingindices

indices of occurence of each row in MATLAB


I have two matrices, A and B. (B is continuous like 1:n)

I need to find all the occurrences of each individual row of B in A, and store those row indices accordingly in cell array C. See below for an example.

A = [3,4,5;1,3,5;1,4,3;4,2,1]
B = [1;2;3;4;5]

Thus,

C = {[2,3,4];[4];[1,2,3];[1,3,4];[1,2]}

Note C does not need to be in a cell array for my application. I only suggest it because the row vectors of C are of unequal length. If you can suggest a work-around, this is fine too.

I've tried using a loop running ismember for each row of B, but this is too slow when the matrices A and B are huge, with around a million entries. Vectorized code is appreciated.

(To give you context, the purpose of this is to identify, in a mesh, those faces that are attached to a single vertex. Note I cannot use the function edgeattachments because my data are not of the form "TR" in triangulation representation. All I have is a list of faces and list of vertices.)


Solution

  • Well, the best answer for this would require knowledge of how A is filled. If A is sparse, that is, if it has few columns values and B is quite large, then I think the best way for memory saving may be using a sparse matrix instead of a cell.

    % No fancy stuff, just fast and furious 
    bMax = numel(B);
    nRows = size(A,1);
    
    cLogical = sparse(nRows,bMax);
    
    for curRow = 1:nRows
      curIdx = A(curRow,:);
      cLogical(curRow,curIdx) = 1;
    end
    

    Answer:

    cLogical =
    
       (2,1)        1
       (3,1)        1
       (4,1)        1
       (4,2)        1
       (1,3)        1
       (2,3)        1
       (3,3)        1
       (1,4)        1
       (3,4)        1
       (4,4)        1
       (1,5)        1
       (2,5)        1
    

    How to read the answer. For each column the rows show the indexes that the column index appears in A. That is 1 appears in rows [2 3 4], 2 appear in row [4], 3 rows [1 2 3], 4 row [1 3 4], 5 in row [1 2].

    Then you can use cLogical instead of a cell as an indexing matrix in the future for your needs.

    Another way would be to allocate C with the expected value for how many times an index should appear in C.

    % Fancier solution using some assumed knowledge of A
    bMax = numel(B);
    nRows = size(A,1);
    nColumns = size(A,2);
    
    % Pre-allocating with the expected value, an attempt to reduce re-allocations.
    % tic; for rep=1:10000; C = mat2cell(zeros(bMax,nColumns),ones(1,bMax),nColumns); end; toc 
    % Elapsed time is 1.364558 seconds.
    % tic; for rep=1:10000; C = repmat({zeros(1,nColumns)},bMax,1); end; toc
    % Elapsed time is 0.606266 seconds.
    % So we keep the not fancy repmat solution
    C = repmat({zeros(1,nColumns)},bMax,1);
    for curRow = 1:nRows
      curIdxMsk = A(curRow,:);
      for curCol = 1:nColumns
        curIdx = curIdxMsk(curCol);
        fillIdx = ~C{curIdx};
        if any(fillIdx) 
          fillIdx = find(fillIdx,1);
        else
          fillIdx = numel(fillIdx)+1;
        end
        C{curIdx}(fillIdx) = curRow;
      end
    end
    
    % Squeeze empty indexes:
    for curRow = 1:bMax
      C{curRow}(~C{curRow}) = [];
    end
    

    Answer:

    >> C{:}
    
    ans =
    
         2     3     4
    
    
    ans =
    
         4
    
    
    ans =
    
         1     2     3
    
    
    ans =
    
         1     3     4
    
    
    ans =
    
         1     2
    

    Which solution will performs best? You do a performance test in your code because it depends on how big is A, bMax, the memory size of your computer and so on. Yet, I'm still curious with solutions other people can do for this x). I liked chappjc's solution although it has the cons that he has pointed out.

    For the given example (10k times):

    Solution 1: Elapsed time is 0.516647 seconds. 
    Solution 2: Elapsed time is 4.201409 seconds (seems that solution 2 is a bad idea hahaha, but since it was created to the specific issue of A having many rows it has to be tested in those conditions).
    chappjc' solution: Elapsed time is 2.405341 seconds.