Search code examples
pythonpandasdataframesortingyfinance

Sort multiple columns with one key


I have a following code which gets data from yfinance for single ticker (GOOGL) and then sort this data by Close price (Ascending).

import bs4 as bs
import requests
import yfinance as yf
import datetime
import pandas as pd
import time
import numpy as np

starttimer = time.time()
resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
soup = bs.BeautifulSoup(resp.text, 'lxml')
table = soup.find('table', {'class': 'wikitable sortable'})

tickers = []

for row in table.findAll('tr')[1:]:
    ticker = row.findAll('td')[0].text
    tickers.append(ticker)

tickers = [s.replace('\n', '') for s in tickers]

start = datetime.datetime(2013, 2, 27)
end = datetime.datetime(2023, 3, 3)
data = yf.download("GOOGL MSFT AAPL", start=start, end=end)

df = data.stack().reset_index().rename(index=str, columns={"level_1": "Symbol"}).sort_values(['Symbol','Date'])
df['Date'] = df['Date'].apply(lambda x: x.strftime('%d%m%Y'))
df.set_index('Date', inplace=True)
df = pd.DataFrame(data)
print(df)
eodPrices   = pd.DataFrame(data=data);
percentageChange = round(eodPrices.pct_change()*100,2).shift(-1);
percentageChange.sort_values(by=['Close'],inplace=True)
print(percentageChange)
endtimer = time.time()
elapsed_time = endtimer - starttimer
print('Execution time: ', elapsed_time, 'seconds' )

enter image description here

However when I want to get sorted columns for multiple tickers

data = yf.download("GOOGL MSFT AAPL", start=start, end=end)

I receive following errors:

ValueError: The column label 'Close' is not unique. For a multi-index, the label must be a tuple with elements corresponding to each level.

My aim is to sort ticker GOOGL by its Close prices, ticker MSFT by its Close Prices and ticker AAPL by its close prices and then print these 'Close' columns (just these 3 columns).


Solution

  • To get only Close column, you can use:

    df1 = yf.download("GOOGL MSFT AAPL", start=start, end=end)['Close']
    

    To sort values for each column (ignore Date index?), you can use:

    df2 = pd.concat([df[col].sort_values().reset_index(drop=True)
                        for col in df.columns], axis=1)
    

    Output:

    >>> df1
                      AAPL      GOOGL        MSFT
    Date                                         
    2013-02-26   16.034643  19.773024   27.370001
    2013-02-27   15.877500  20.014515   27.809999
    2013-02-28   15.764286  20.050051   27.799999
    2013-03-01   15.373929  20.174925   27.950001
    2013-03-04   15.001786  20.558058   28.150000
    ...                ...        ...         ...
    2023-02-24  146.710007  89.129997  249.220001
    2023-02-27  147.919998  89.870003  250.160004
    2023-02-28  147.410004  90.059998  249.419998
    2023-03-01  145.309998  90.360001  246.270004
    2023-03-02  145.910004  92.000000  251.110001
    
    [2522 rows x 3 columns]
    
    >>> df2
                AAPL       GOOGL        MSFT
    0      13.947500   19.166918   27.370001
    1      14.001786   19.390640   27.799999
    2      14.063571   19.460711   27.809999
    3      14.161786   19.567818   27.870001
    4      14.216786   19.583584   27.910000
    ...          ...         ...         ...
    2517  179.380005  148.852005  341.269989
    2518  179.449997  148.918503  341.950012
    2519  179.699997  148.926498  342.450012
    2520  180.330002  149.031006  342.540009
    2521  182.009995  149.838501  343.109985
    
    [2522 rows x 3 columns]