Search code examples
pythonpandasgroup-byrank

How to rank over multiple rows and columns (dataframe)


Small sample of data:

df = pd.DataFrame({'Customer_ID': [1,1,1,1, 2,2,2,2], 'Customer Age' : [15,15,15,15, 50,50,50,50],   'Similar Customer ID': [5, 8, 11, 15, 22, 28, 31, 11], 'Age': [40, 43, 37, 11, 29, 42, 14, 33],
})

Question

How do I add a column that rank the customer with related customers, i.e., Customer 1 is 15 years old, I want the desired output to rank that customer as second youngest compared to 'Similar Customer':

df = pd.DataFrame({'Customer_ID': [1,1,1,1, 2,2,2,2], 'Customer Age' : [15,15,15,15, 50,50,50,50],   'Similar Customer ID': [5, 8, 11, 15, 22, 28, 31, 11], 'Age': [40, 43, 37, 11, 29, 42, 14, 33], 'Age Rank': [2, 2, 2, 2, 5,5,5,5]
})


Solution

  • You can just compare the Customer Age to Age then take the sum (over each customer):

    df['Age Rank'] = (df['Customer Age'].ge(df['Age'])
                        .groupby(df['Customer_ID'])
                        .transform('sum') + 1
                     )
    print(df)
    

    Output:

       Customer_ID  Customer Age  Similar Customer ID  Age  Age Rank
    0            1            15                    5   40         2
    1            1            15                    8   43         2
    2            1            15                   11   37         2
    3            1            15                   15   11         2
    4            2            50                   22   29         5
    5            2            50                   28   42         5
    6            2            50                   31   14         5
    7            2            50                   11   33         5