Search code examples
pythonnumpysortingnumpy-ndarray

Sort n-dimensional numpy array by sorting highest dimension and afterwards reordering entire segments


I've got a list of ndarrays (all arrays are of the same dimensions but there can be n dimensions) which I need to compare with each other to eliminate duplicates. For that reason I am trying to sort all arrays equally.

A quick eample for 2d and 3d arrays:

[[1, 0]  == [[0, 1]   ==  [[1, 0]  ==  [[0, 1]
 [0, 1]]     [0, 1]]       [1, 0]]      [1, 0]]

[[[1,1],[0,2]],[[0,0],[0,0]]] == [[[0, 0], [0, 0]],[[1, 1],[0, 2]]] == [[[0, 0], [0, 0]],[[0, 2], [[1, 1]]] ...

I don't care how it is ordered, i just want to find, handle and afterwards eliminate duplicates.

I tried using numpy.sort(array) which can sadly miss duplicates (e.g. [[0,0],[0,1]] and [[0,1],[0,0]]). If I sort by every dimension starting with the highest it destroys my structure (eg. [[1, 1], [0, 2]] becomes [[0, 1], [1, 2]]).

From a nested list point of view, I think, the best idea is to first sort all lists at depth n. Afterwards sort the nested lists at depth n-1 ... 0 by first ordering by sum and afterwards if they are equal by comparing the elements.


Solution

  • If I understand correctly, you could use a recursive sorter. I'm providing a python solution here since I believe there is no advantage of using numpy here:

    def sorter(lst):
        if isinstance(lst, list):
            return sorted(map(sorter, lst))
        else:
            return lst
    

    Or, maybe more generic using a try/except (this would also work on numpy arrays):

    def sorter(lst):
        try:
            return sorted(map(sorter, lst))
        except TypeError:
            return lst
    

    Examples:

    example1 = ([[1, 0], [0, 1]],
                [[0, 1], [0, 1]],
                [[1, 0], [1, 0]],
                [[0, 1], [1, 0]],
                [[0, 0], [1, 1]], # different
               )
    
    example2 = ([[[1,1],[0,2]],[[0,0],[0,0]]],
                [[[0,0],[0,0]],[[1,1],[0,2]]],
                [[[0,0],[0,0]],[[0,2],[1,1]]],
                [[[0,0],[0,0]],[[1,1],[2,0]]],
                [[[0,0],[0,0]],[[1,2],[1,0]]], # different
               )
    
    [sorter(x) for x in example1]
    
    [[[0, 1], [0, 1]],
     [[0, 1], [0, 1]],
     [[0, 1], [0, 1]],
     [[0, 1], [0, 1]],
     [[0, 0], [1, 1]]] # different
    
    [sorter(x) for x in example2]
    
    [[[[0, 0], [0, 0]], [[0, 2], [1, 1]]],
     [[[0, 0], [0, 0]], [[0, 2], [1, 1]]],
     [[[0, 0], [0, 0]], [[0, 2], [1, 1]]],
     [[[0, 0], [0, 0]], [[0, 2], [1, 1]]],
     [[[0, 0], [0, 0]], [[0, 1], [1, 2]]]] # different