Search code examples
pythonpandasaggregaterank

Pandas groupby then rank: correct syntax to specify order when passing dict of parameters


I'd like to specify rank order in pandas. I know you can do it when you do:

df.groupby([x, y]).rank(ascending=False)

Question 1

How can I specify the ranking order (ascending=False) inside agg method:

df.groupby([x, y]).agg({
        ('r', 'c'): 'rank'
    })

Question 2

df.groupby([x, y])['r', 'c'].rank(ascending=False)

Throws an error: KeyError: Columns not found

How can I specify a multi lelvel columns for rank?

Data Structure is as follows:

                                            r 
                                            c 
z         x                           y                             
1         2016-11-01 00:00:00+00:00   3121  143       
                                      3923  11      
                                      3953  4    
                                      4880  12  

Solution

  • I think you can use:

    x = 'x'
    y = 'y'
    b = df[('r','c')].groupby(level=[x, y]).rank(ascending=False)
    print (b)
    z  x                          y   
    1  2016-11-01 00:00:00+00:00  3121    1.0
                                  3923    1.0
                                  3953    1.0
                                  4880    1.0
    Name: (r, c), dtype: float64
    

    Or need tuple - add , to the end:

    x = 'x'
    y = 'y'
    b = df.groupby(level=[x, y])[('r','c'), ].rank(ascending=False)
    print (b)
                                        r
                                        c
    z x                         y        
    1 2016-11-01 00:00:00+00:00 3121  1.0
                                3923  1.0
                                3953  1.0
                                4880  1.0
    #print (df)