Search code examples
pythonnumpy-ndarraycolumnsorting

How to sort a 2d numpy array by the columns with the biggest sum


I have a 2D array with shape (35,6004) and I want to sort it based on the sum of the columns. So if I had for example

array([[5, 3, 13], 
       [1, 2, 20],
       [6, 2,  6]])

I want to sort my array to be like so

array([[13, 5, 3], 
       [20, 1, 2],
       [6 , 6, 2]]).

I tried finding the index of the column

def find_max_col(o):
    t = o.sum(axis=0)
    te = t.tolist()
    return te.index(t.max())

then I use the output of that function to sort the array

test = array[array[:, find_max_col(array)].argsort()]

and do this to check and see if it was successful

t1 = test.sum(axis=0)
print(t1)

As I understand if I sort according to the column with the biggest sum, I should get an array that shows me the sums of all the columns in a descending form as the output of the above code.

Is my code for checking if worked wrong, did I make a mistake in the sorting or did I not even find the correct index of the column to sort by?


Solution

  • I'm not sure if your solution is incorrect, but it is certainly more complicated than necessary:

    >>> a = np.array([[5, 3, 13],
                      [1, 2, 20],
                      [6, 2,  6]])
    
    >>> a[:, a.sum(axis=0).argsort()]  # sort columns small-to-large
    array([[ 3,  5, 13],
           [ 2,  1, 20],
           [ 2,  6,  6]])
    
    >>> a[:, (a.sum(axis=0)*-1).argsort()]  # multiply sums by -1 to sort large-to-small
    array([[13,  5,  3],
           [20,  1,  2],
           [ 6,  6,  2]])