Search code examples
pythonpandasdense-rank

Python Equivalent to SQL Rank


I want to have an SQL rank equivalent in python pandas module. I want to filter out all the dates which have the rank = 1

I've the following dataframe:

id    date
12    2021-06-01
12    2021-06-15
12    2021-06-21
34    2021-06-05
87    2021-06-19
53    2021-06-05

and I need to apply rank in the following way:

id    date          rank
12    2021-06-01      1
12    2021-06-15      2
12    2021-06-21      3
34    2021-06-05      1
87    2021-06-19      1
53    2021-06-05      1

The SQL for this would be

select id, date, rank() over (partition by id order by date asc) as rank 
from table;

Solution

  • IIUC use GroupBy.rank:

    df['date'] = pd.to_datetime(df['date'])
    df['rank'] = df.groupby('id')['date'].rank(method='dense').astype(int)
    print (df)
       id       date  rank
    0  12 2021-06-01     1
    1  12 2021-06-15     2
    2  12 2021-06-21     3
    3  34 2021-06-05     1
    4  87 2021-06-19     1
    5  53 2021-06-05     1
    

    If datetimes are sorted per groups is possible GroupBy.cumcount:

    df = df.sort_values(['id','date'])
    df['rank'] = df.groupby('id')['date'].cumcount().add(1)