Search code examples
pythonpython-3.xpandasrunning-count

How to create rank column in Python based on other columns


I have a python dataframe that looks like the following:

enter image description here

This dataframe has been sorted in descending order by 'transaction_count'. I want to create another column in that dataframe called 'rank' that contains the count of occurrences of cust_ID. My desired output would look something like the following:

enter image description here

For cust_ID = 1234 with transaction_count = 4, the rank would be 1, for the next appearance of cust_ID = 1234, the rank would be 2 and so on.

I tried the following among other things:

df['rank'] = df["cust_ID"].value_counts()
df.head(10)

But the rank column gets created as all NaN values

enter image description here

Any suggestions on how to approach this would be greatly appreciated!


Solution

  • Use groupby + cumcount:

    df['rank'] = df.groupby('cust_ID').cumcount() + 1
    print(df['rank'])
    

    Output

    0    1
    1    2
    2    1
    3    1
    4    2
    5    3
    Name: rank, dtype: int64