Evaluating / ranking in Pandas dataframe

I have Pandas DataFrame where I imported Excel spreadsheet which contains couple of hundreds rows. Each row is a company name with some fundamental data as ratios etc. in each column.

Symbol  Name  P/BV  PE     NCB

A       AGT   4.382 42.59  0.0145

AAAP    AdvAc 6.91  NaN    -0.0003

AAME    AtlAm 0.6205 29.55 0.0089

AAN     Aaron 2.093 23.89  0.0213

etc. There is more columns and more rows, but I think this is enough to illustrate the data. Sometimes there is no value for a ratio, so automatically there was filled NaN value.

What I would like to do is to evaluate each company:

each column (P/BV for example) is a universe that contains values from lowest to the highest - creating a scope of 100 %. Each company will be evaluated in new column: I want to assign percentile ranking for each company 1 to 100. If company has P/BV ratio that is in lowest 1% of the universe, it receives rank 100 (as lowest is the best here). The highest ratio receives rank 1.
if a value is missing - there is NaN - assign rank 50
I need to evaluate each column like that, so I need to create new column for each ratio column and fill it with the ranking values.
some ratios are best when it's value is highest, like the column NCB.

Later on I will add all the rank values for each company in the final column, which I can manage. I just cannot make this ranking to work. Can anybody help me?

Thank you in advance :-)

Solution

Here's an example using similar data. In this example:

Low P/B & PE receives high rank (inverse rank)
Missing value receives 50% (50th percentile)
High MCV receives high rank (direct rank)

Sample DataFrame:

  Symbol     P_BV     PE      NCB
0      A -0.59587  42.50 -1.42680
1      B  0.09875    NaN -0.82033
2      C  0.10987  29.55  0.73841
3      D  3.03278  23.89 -0.31283
4      E  0.70129  10.00 -1.00830

Now use DataFrame.rank with the parameter pct=True. The slighly time-consuming part is you'll need to apply it separately for different columns because you want to rank some ascending and some descending.

new_df = df.copy()
new_df.loc[:, ['P_BV', 'PE']] = (df[['P_BV', 'PE']]
                                     .rank(pct=True, ascending=False) * 100)
new_df.loc[:, 'NCB'] = df['NCB'].rank(pct=True) * 100  # ascending=True

Lastly, fill NaN values with 50:

new_df.fillna(value=50., inplace=True)

Your result looks like this:

  Symbol   P_BV     PE    NCB
0      A  100.0   25.0   20.0
1      B   80.0   50.0   60.0
2      C   60.0   50.0  100.0
3      D   20.0   75.0   80.0
4      E   40.0  100.0   40.0

Combined code you'll need:

new_df = df.copy()
new_df.loc[:, ['P_BV', 'PE']] = (df[['P_BV', 'PE']]
                                     .rank(pct=True, ascending=False) * 100)
new_df.loc[:, 'NCB'] = df['NCB'].rank(pct=True) * 100  # ascending=True
new_df.fillna(value=50., inplace=True)