Search code examples
matlabduplicatesunique

Remove duplicate rows of matrix and cell together in MATLAB


I have a matrix A with 30000 rows and a cell B with the same row number. I would like to remove duplicate rows. If it's only a matrix A, I can use the function unique. But for A(matrix) and B(Cell) together, how can I proceed? Many thanks!

The examples of A and B are shown below. Row 4 and 5 are duplicate (for both A and B) while Row 5 and 6 should not be taken as duplicate.

A

1   2   3   4   
11  12  13  14
21  22  23  24
31  32  33  34
31  32  33  34
31  32  33  34
41  42  43  44

B

a
b
c
d
d
e
f

Solution

  • You can do this with the second return value from unique:

    [C,ia,ic] = unique(A,'rows',setOrder)
    

    ia gives you the indices into A of the unique rows. If you do this on your matrix A, you get:

    >> [~,iA,~] = unique(A,'rows','first')
    iA =
    
       1
       2
       3
       4
       7
    

    (I used the option 'first' because it seemed more natural to me to return row 4 than row 5. You can use the default 'last' if you prefer, as long as you're consistent.)

    Since B is a cell array, you don't need the 'rows' option:

    >> [~,iB,~] = unique(B,'first')
    iB =
    
       1
       2
       3
       4
       6
       7
    

    This tells us that, despite what matrix A tells us, row 4 is unique from row 6. If we take the set union of these two, we get:

    >> uAB = union(iA,iB)
    uAB =
    
       1
       2
       3
       4
       6
       7
    

    Now you should have the indices of all of the unique rows:

    >> A(uAB,:)
    ans =
    
        1    2    3    4
       11   12   13   14
       21   22   23   24
       31   32   33   34
       31   32   33   34
       41   42   43   44
    
    >> B(uAB)
    ans =
    {
      [1,1] = a
      [2,1] = b
      [3,1] = c
      [4,1] = d
      [5,1] = e
      [6,1] = f
    }