Search code examples
pythonpandasdataframegroup-byrank

Pandas rank values within groupby, starting a new rank if diff is greater than 1


I have a sample dataframe as follows:

data={'Store':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
      'Week':[1,2,3,4,5,6,19,20,21,22,1,2,50,51,52,60,61,62,70,71,72,73]}
df=pd.DataFrame.from_dict(data)
df['WeekDiff'] = df.groupby('Store')['Week'].diff().fillna(1)

I added a difference column to find the gaps in the Week column within my data. I have been trying to groupby Store and somehow use the differences column to achieve the below output but with no success. I need the ranks to start from each occurence of a value greater than one until the next such value. Please see a sample output I'd like to achieve.

result_data={'Store':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
      'Week':[1,2,3,4,5,6,19,20,21,22,1,2,50,51,52,60,61,62,70,71,72,73],
      'Rank':[1,1,1,1,1,1,2,2,2,2,1,1,2,2,2,3,3,3,4,4,4,4]}

I am new to python and pandas and I've been trying to google this all day, but couldn't find a solution. Could you please help me how to do this?

Thank you in advance!


Solution

  • You could try as follows:

    import pandas as pd
    
    data={'Store':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
          'Week':[1,2,3,4,5,6,19,20,21,22,1,2,50,51,52,60,61,62,70,71,72,73]}
    
    df = pd.DataFrame(data)
    
    df['Rank'] = df.groupby('Store')['Week'].diff()>1
    df['Rank'] = df.groupby('Store')['Rank'].cumsum().add(1)
    
    # check with expected output:
    result_data={'Store':[1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2],
          'Week':[1,2,3,4,5,6,19,20,21,22,1,2,50,51,52,60,61,62,70,71,72,73],
          'Rank':[1,1,1,1,1,1,2,2,2,2,1,1,2,2,2,3,3,3,4,4,4,4]}
    
    result_df = pd.DataFrame(result_data)
    
    df.equals(result_df)
    # True
    

    Or as a (lengthy) one-liner:

    df['Rank'] = df.set_index('Store').groupby(level=0)\
        .agg(Rank=('Week','diff')).gt(1).groupby(level=0)\
            .cumsum().add(1).reset_index(drop=True)