I have a dataset with stock price history that I have run a correlation matrix on in pandas. I have unstacked the correlation matrix and filtered it to remove duplicates/redundant values and sorted it like so.
corr_pairs = corr_matrix.unstack().drop_duplicates().sort_values(kind='quicksort')
corr_pairs = corr_pairs[corr_pairs < 1]
print('Sorted Pairs: ')
print(corr_pairs)
Sorted Pairs:
tsla meta 0.709581
googl meta 0.816701
tsla googl 0.916947
Now I wish to extract the names of the stocks that had the highest correlation in the set and assign them to variables such as stock1 and stock2. If I do corr_pairs[-1] this will produce the value for the highest correlation, I need the names of the stocks that have the highest correlation, in this case, tsla and googl.
Thank you to anyone who knows what to do! Appreciate the help!
Your "corr_pairs" will in fact be a pd.Series
with a pd.MultiIndex
:
import pandas as pd
d = {('tsla', 'meta'): 0.709581,
('googl', 'meta'): 0.816701,
('tsla', 'googl'): 0.916947}
corr_pairs = pd.Series(d)
print(corr_pairs.index)
MultiIndex([( 'tsla', 'meta'),
('googl', 'meta'),
( 'tsla', 'googl')],
)
So, the MultiIndex is basically an array with tuples. We can access the tuple for the max value by using pd.Series.idxmax
, and then simply assign the result to two variables:
stock1, stock2 = corr_pairs.idxmax()
print(f'stock1 = {stock1}, stock2 = {stock2}')
# stock1 = tsla, stock2 = googl