I have a dataframe of stock returns for several companies on the S&P. I have caluclate their returns.
I want to have a for loop which gets the top 5 (for instance) companies with the highest return values (on average over the period) and lowest 5 companies return values from a large dataframe. Then I want to plot it.
I have a code which gives me all the companies' returns and plots them.
graph_df = pd.DataFrame()
for name in df['Name'].unique():
print(name) # keeping track of which tickers are included
name_df = df.copy()[df['Name']==name] # creating a new df, copying in the name columns
name_df[f"{name}_factor"] = name_df["Difference"]
if graph_df.empty:
graph_df = name_df[[f"{name}_factor"]]
else:
graph_df = graph_df.join(name_df[f"{name}_factor"])
This is a snippet from print(graph_df):
StockRet1 StockRet2 StockRet3 StockRet4 StockRet5
Dates
1990-01-31 0.000000 0.000000 0.000000 0.000000 0.000000
1990-02-28 -0.098820 0.000000 -0.076981 0.019755 -0.012170
1990-03-30 0.023653 0.000000 -0.011115 0.044565 -0.042621
1990-04-30 0.010303 0.000000 -0.074512 0.067319 -0.003569
The plot is very messy as you can see, so I want to only show the highest and the lowest. I appreciate all help!
when you have the entire dataFrame:
df = df[np.concatenate((df.sum().sort_values().index[:5],df.sum().sort_values().index[-5:]))]
this gives you the top 5 and bottom 5. you just have to plot them.
to avoid any overlapping between array you could wrap the concatenate in a set()
to get unique values.