Search code examples
pythonpandasdataframegreatest-n-per-group

How to select N rows with highest values from every group in pandas DataFrame


I am looking for a nice generic way to get 2 strategies with highest sharpe values for every ccyPair.

DataFrame (in code):

df = pd.DataFrame({
    'ccyPair': ['EURUSD', 'EURUSD', 'EURUSD', 'USDJPY', 'USDJPY', 'USDJPY'],
    'stype': ['SMA', 'Channel', 'Vol', 'SMA', 'Channel', 'Vol'],
    'sharpe': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]
    })

DataFrame (table view):

   ccyPair    stype  sharpe
 0  EURUSD      SMA     0.1
 1  EURUSD  Channel     0.2
 2  EURUSD      Vol     0.3
 3  USDJPY      SMA     0.4
 4  USDJPY  Channel     0.5
 5  USDJPY      Vol     0.6

Expected outcome has 2 EURUSD and 2 USDJPY strategies with best sharpe values:

   ccyPair    stype  sharpe
 0  EURUSD  Channel     0.2
 1  EURUSD      Vol     0.3
 2  USDJPY  Channel     0.5
 3  USDJPY      Vol     0.6

Solution

  • Use DataFrame.sort_values with GroupBy.tail for last bottom values:

    df = df.sort_values(['ccyPair','sharpe']).groupby('ccyPair').tail(2)
    print (df)
      ccyPair    stype  sharpe
    1  EURUSD  Channel     0.2
    2  EURUSD      Vol     0.3
    4  USDJPY  Channel     0.5
    5  USDJPY      Vol     0.6
    

    EDIT for @cs95 - if is using head is order of values different ;)

    df = df.sort_values(['ccyPair','sharpe'], ascending=False).groupby('ccyPair').head(2)
    print (df)
      ccyPair    stype  sharpe
    5  USDJPY      Vol     0.6
    4  USDJPY  Channel     0.5
    2  EURUSD      Vol     0.3
    1  EURUSD  Channel     0.2
    

    df = df.sort_values(['ccyPair','sharpe'], ascending=[True, False]).groupby('ccyPair').head(2)
    print (df)
      ccyPair    stype  sharpe
    2  EURUSD      Vol     0.3
    1  EURUSD  Channel     0.2
    5  USDJPY      Vol     0.6
    4  USDJPY  Channel     0.5