Search code examples
pythonpandasrank

Applying Ranks to every group in Pandas Groupby


Say I have a simple dataset (namely df1) like this:

 ID     Name     Max_FileID
--------------------------------
  1      Dog          3
  1      Dog          3
  1      Dog          3
  2      Bird         1
  3      Cat          5
  3      Cat          5

I would like to group the dataset by ID (I have no problem with this):

df1.groupby('ID')

Then add a new column using Max_FileID + Rank.

The result should look like this

 ID     Name     Max_FileID     Rank
------------------------------------------
  1      Dog          3           4
  1      Dog          3           5
  1      Dog          3           6
  2      Bird         1           2
  3      Cat          5           6
  3      Cat          5           7

To help understand better what I am trying to achieve, here is a SQL equivalent:

SELECT
    ID,
    Name,
    Max_FileID,
    Max_FileID + ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ID)
FROM df1

In addition, I created a custom function:

def sequential_ranking(self, iterable, max_id):
  try:
    max_id = max_id - 1
    seq_nums = list()
    for num in iterable:
      max_id += 1
      seq_nums.append(max_id)
    return seq_nums

Not sure if there is a built-in function for it. Thanks in advance!!


Solution

  • You can use DataFrameGroupBy.rank function which returns the rank values in the group and if you specify the method parameter to first then ranks are assigned in order they appear in the group.

    You can use this:

    df["Rank"] = df["Max_FileID"] + df.groupby("ID")["Max_FileID"].rank(method="first").astype(int)
    

    Result:

    >>> print(df)
    
       ID  Name  Max_FileID  Rank
    0   1   Dog           3     4
    1   1   Dog           3     5
    2   1   Dog           3     6
    3   2  Bird           1     2
    4   3   Cat           5     6
    5   3   Cat           5     7