I have a matrix A with 30000 rows and a cell B with the same row number. I would like to remove duplicate rows. If it's only a matrix A, I can use the function unique. But for A(matrix) and B(Cell) together, how can I proceed? Many thanks!
The examples of A and B are shown below. Row 4 and 5 are duplicate (for both A and B) while Row 5 and 6 should not be taken as duplicate.
A
1 2 3 4
11 12 13 14
21 22 23 24
31 32 33 34
31 32 33 34
31 32 33 34
41 42 43 44
B
a
b
c
d
d
e
f
You can do this with the second return value from unique
:
[C,ia,ic] = unique(A,'rows',setOrder)
ia
gives you the indices into A
of the unique rows. If you do this on your matrix A
, you get:
>> [~,iA,~] = unique(A,'rows','first')
iA =
1
2
3
4
7
(I used the option 'first'
because it seemed more natural to me to return row 4 than row 5. You can use the default 'last'
if you prefer, as long as you're consistent.)
Since B
is a cell array, you don't need the 'rows'
option:
>> [~,iB,~] = unique(B,'first')
iB =
1
2
3
4
6
7
This tells us that, despite what matrix A
tells us, row 4 is unique from row 6. If we take the set union of these two, we get:
>> uAB = union(iA,iB)
uAB =
1
2
3
4
6
7
Now you should have the indices of all of the unique rows:
>> A(uAB,:)
ans =
1 2 3 4
11 12 13 14
21 22 23 24
31 32 33 34
31 32 33 34
41 42 43 44
>> B(uAB)
ans =
{
[1,1] = a
[2,1] = b
[3,1] = c
[4,1] = d
[5,1] = e
[6,1] = f
}