Search code examples
pandasranking

Ranking pandas data frame rows on multiple columns


I am new to Pandas. I am trying to understand how to do something in pandas, that I do in SQL -

I have a table like -

Account Company Blotter
112233  10      62
233445  12      62
233445  10      66
343454  21      66
343454  21      64
768876  25      54

In SQL, if a given account appears in multiple rows I would use rank() and if I want to give a preference to a certain company, I would put a case statement to force that company to be prioritized. I can also use Blotter column as an additional rank parameter. e.g.

rank() over(
    partition by ACCOUNT 
    order by case 
                when COMPANY='12' then 0 
                when COMPANY='21' then 1 
                else COMPANY 
             end, 
             case 
                when BLOTTER ='66' then 0 
                else BLOTTER 
             end
)

expected output:

   Account  Company  Blotter  rank
0   112233       10       62     1
1   233445       12       62     1
2   233445       10       66     2
3   343454       21       66     1
4   343454       21       64     2
5   768876       25       54     1

Solution

  • You may want to try this:

    # recompute the sort criteria for company and blotter
    ser_sort_company= df['Company'].map({12: 0, 21: 1}).fillna(df['Company'])
    ser_sort_blotter= df['Blotter'].map({12: 0, 21: 1}).fillna(df['Blotter'])
    df['rank']= (df
         # temporarily create sort columns
         .assign(sort_company=ser_sort_company)
         .assign(sort_blotter=ser_sort_blotter)
         # temporarily sort the result
         # this replaces the ORDER BY part
         .sort_values(['sort_company', 'sort_blotter'])
         # group by Account to replace the PARTITION BY part
         .groupby('Account')
         # get the position of the record in the group (RANK part)
         .transform('cumcount') + 1
    )
    
    df
    

    It evaluates to:

       Account  Company  Blotter  rank
    0   112233       10       62     1
    1   233445       12       62     1
    2   233445       10       66     2
    3   343454       21       66     2
    4   343454       21       64     1
    5   768876       25       54     1