I have Pandas DataFrame where I imported Excel spreadsheet which contains couple of hundreds rows. Each row is a company name with some fundamental data as ratios etc. in each column.
Symbol Name P/BV PE NCB
A AGT 4.382 42.59 0.0145
AAAP AdvAc 6.91 NaN -0.0003
AAME AtlAm 0.6205 29.55 0.0089
AAN Aaron 2.093 23.89 0.0213
etc. There is more columns and more rows, but I think this is enough to illustrate the data. Sometimes there is no value for a ratio, so automatically there was filled NaN value.
What I would like to do is to evaluate each company:
each column (P/BV for example) is a universe that contains values from lowest to the highest - creating a scope of 100 %. Each company will be evaluated in new column: I want to assign percentile ranking for each company 1 to 100. If company has P/BV ratio that is in lowest 1% of the universe, it receives rank 100 (as lowest is the best here). The highest ratio receives rank 1.
if a value is missing - there is NaN - assign rank 50
I need to evaluate each column like that, so I need to create new column for each ratio column and fill it with the ranking values.
some ratios are best when it's value is highest, like the column NCB.
Later on I will add all the rank values for each company in the final column, which I can manage. I just cannot make this ranking to work. Can anybody help me?
Thank you in advance :-)
Here's an example using similar data. In this example:
Sample DataFrame:
Symbol P_BV PE NCB
0 A -0.59587 42.50 -1.42680
1 B 0.09875 NaN -0.82033
2 C 0.10987 29.55 0.73841
3 D 3.03278 23.89 -0.31283
4 E 0.70129 10.00 -1.00830
Now use DataFrame.rank
with the parameter pct=True
. The slighly time-consuming part is you'll need to apply it separately for different columns because you want to rank some ascending and some descending.
new_df = df.copy()
new_df.loc[:, ['P_BV', 'PE']] = (df[['P_BV', 'PE']]
.rank(pct=True, ascending=False) * 100)
new_df.loc[:, 'NCB'] = df['NCB'].rank(pct=True) * 100 # ascending=True
Lastly, fill NaN
values with 50:
new_df.fillna(value=50., inplace=True)
Your result looks like this:
Symbol P_BV PE NCB
0 A 100.0 25.0 20.0
1 B 80.0 50.0 60.0
2 C 60.0 50.0 100.0
3 D 20.0 75.0 80.0
4 E 40.0 100.0 40.0
Combined code you'll need:
new_df = df.copy()
new_df.loc[:, ['P_BV', 'PE']] = (df[['P_BV', 'PE']]
.rank(pct=True, ascending=False) * 100)
new_df.loc[:, 'NCB'] = df['NCB'].rank(pct=True) * 100 # ascending=True
new_df.fillna(value=50., inplace=True)