Search code examples
pythonarraysnumpyrankingarray-broadcasting

How to get the rank of a column in numpy 2d array?


suppose I have an array:

a = np.array([[1,2,3,4],
              [4,2,5,6],
              [6,5,0,3]])

I want to get the rank of column 0 in each row(i.e. np.array([0, 1, 3])), Is there any short way to do this?

In 1d array I can use np.sum(a < a[0]) to do this, but how about 2d array? But it seems < cannot broadcast.


Solution

  • Approach #1

    Use np.argsort along the rows and look for the index 0 corresponding to the first column to give us a mask of the same shape as the input array. Finally, get the column indices of the matches (True) in the mask for the desired rank output. So, the implementation would be -

    np.where(a.argsort(1)==0)[1]
    

    Approach #2

    Another way to get the ranks of all columns in one go, would be a slight modification of the earlier method. The implementation would look like this -

    (a.argsort(1)).argsort(1)
    

    So, to get the rank of first column, index into the first column of it, like so -

    (a.argsort(1)).argsort(1)[:,0]
    

    Sample run

    In [27]: a
    Out[27]: 
    array([[1, 2, 3, 4],
           [4, 2, 5, 6],
           [6, 5, 0, 3]])
    
    In [28]: np.where(a.argsort(1)==0)[1]
    Out[28]: array([0, 1, 3])
    
    In [29]: (a.argsort(1)).argsort(1) # Ranks for all cols
    Out[29]: 
    array([[0, 1, 2, 3],
           [1, 0, 2, 3],
           [3, 2, 0, 1]])
    
    In [30]: (a.argsort(1)).argsort(1)[:,0] # Rank for first col
    Out[30]: array([0, 1, 3])
    
    In [31]: (a.argsort(1)).argsort(1)[:,1] # Rank for second col
    Out[31]: array([1, 0, 2])