I've got a list of ndarrays (all arrays are of the same dimensions but there can be n dimensions) which I need to compare with each other to eliminate duplicates. For that reason I am trying to sort all arrays equally.
A quick eample for 2d and 3d arrays:
[[1, 0] == [[0, 1] == [[1, 0] == [[0, 1]
[0, 1]] [0, 1]] [1, 0]] [1, 0]]
[[[1,1],[0,2]],[[0,0],[0,0]]] == [[[0, 0], [0, 0]],[[1, 1],[0, 2]]] == [[[0, 0], [0, 0]],[[0, 2], [[1, 1]]] ...
I don't care how it is ordered, i just want to find, handle and afterwards eliminate duplicates.
I tried using numpy.sort(array)
which can sadly miss duplicates (e.g. [[0,0],[0,1]]
and [[0,1],[0,0]]
).
If I sort by every dimension starting with the highest it destroys my structure (eg. [[1, 1], [0, 2]]
becomes [[0, 1], [1, 2]]
).
From a nested list point of view, I think, the best idea is to first sort all lists at depth n. Afterwards sort the nested lists at depth n-1 ... 0 by first ordering by sum and afterwards if they are equal by comparing the elements.
If I understand correctly, you could use a recursive sorter. I'm providing a python solution here since I believe there is no advantage of using numpy here:
def sorter(lst):
if isinstance(lst, list):
return sorted(map(sorter, lst))
else:
return lst
Or, maybe more generic using a try/except (this would also work on numpy arrays):
def sorter(lst):
try:
return sorted(map(sorter, lst))
except TypeError:
return lst
Examples:
example1 = ([[1, 0], [0, 1]],
[[0, 1], [0, 1]],
[[1, 0], [1, 0]],
[[0, 1], [1, 0]],
[[0, 0], [1, 1]], # different
)
example2 = ([[[1,1],[0,2]],[[0,0],[0,0]]],
[[[0,0],[0,0]],[[1,1],[0,2]]],
[[[0,0],[0,0]],[[0,2],[1,1]]],
[[[0,0],[0,0]],[[1,1],[2,0]]],
[[[0,0],[0,0]],[[1,2],[1,0]]], # different
)
[sorter(x) for x in example1]
[[[0, 1], [0, 1]],
[[0, 1], [0, 1]],
[[0, 1], [0, 1]],
[[0, 1], [0, 1]],
[[0, 0], [1, 1]]] # different
[sorter(x) for x in example2]
[[[[0, 0], [0, 0]], [[0, 2], [1, 1]]],
[[[0, 0], [0, 0]], [[0, 2], [1, 1]]],
[[[0, 0], [0, 0]], [[0, 2], [1, 1]]],
[[[0, 0], [0, 0]], [[0, 2], [1, 1]]],
[[[0, 0], [0, 0]], [[0, 1], [1, 2]]]] # different