Search code examples
pythonpandasranking

Evaluating / ranking in Pandas dataframe


I have Pandas DataFrame where I imported Excel spreadsheet which contains couple of hundreds rows. Each row is a company name with some fundamental data as ratios etc. in each column.

Symbol  Name  P/BV  PE     NCB

A       AGT   4.382 42.59  0.0145

AAAP    AdvAc 6.91  NaN    -0.0003

AAME    AtlAm 0.6205 29.55 0.0089

AAN     Aaron 2.093 23.89  0.0213

etc. There is more columns and more rows, but I think this is enough to illustrate the data. Sometimes there is no value for a ratio, so automatically there was filled NaN value.

What I would like to do is to evaluate each company:

  • each column (P/BV for example) is a universe that contains values from lowest to the highest - creating a scope of 100 %. Each company will be evaluated in new column: I want to assign percentile ranking for each company 1 to 100. If company has P/BV ratio that is in lowest 1% of the universe, it receives rank 100 (as lowest is the best here). The highest ratio receives rank 1.

  • if a value is missing - there is NaN - assign rank 50

  • I need to evaluate each column like that, so I need to create new column for each ratio column and fill it with the ranking values.

  • some ratios are best when it's value is highest, like the column NCB.

Later on I will add all the rank values for each company in the final column, which I can manage. I just cannot make this ranking to work. Can anybody help me?

Thank you in advance :-)


Solution

  • Here's an example using similar data. In this example:

    • Low P/B & PE receives high rank (inverse rank)
    • Missing value receives 50% (50th percentile)
    • High MCV receives high rank (direct rank)

    Sample DataFrame:

      Symbol     P_BV     PE      NCB
    0      A -0.59587  42.50 -1.42680
    1      B  0.09875    NaN -0.82033
    2      C  0.10987  29.55  0.73841
    3      D  3.03278  23.89 -0.31283
    4      E  0.70129  10.00 -1.00830
    

    Now use DataFrame.rank with the parameter pct=True. The slighly time-consuming part is you'll need to apply it separately for different columns because you want to rank some ascending and some descending.

    new_df = df.copy()
    new_df.loc[:, ['P_BV', 'PE']] = (df[['P_BV', 'PE']]
                                         .rank(pct=True, ascending=False) * 100)
    new_df.loc[:, 'NCB'] = df['NCB'].rank(pct=True) * 100  # ascending=True
    

    Lastly, fill NaN values with 50:

    new_df.fillna(value=50., inplace=True)
    

    Your result looks like this:

      Symbol   P_BV     PE    NCB
    0      A  100.0   25.0   20.0
    1      B   80.0   50.0   60.0
    2      C   60.0   50.0  100.0
    3      D   20.0   75.0   80.0
    4      E   40.0  100.0   40.0
    

    Combined code you'll need:

    new_df = df.copy()
    new_df.loc[:, ['P_BV', 'PE']] = (df[['P_BV', 'PE']]
                                         .rank(pct=True, ascending=False) * 100)
    new_df.loc[:, 'NCB'] = df['NCB'].rank(pct=True) * 100  # ascending=True
    new_df.fillna(value=50., inplace=True)