Search code examples
pythonrdownloadyahoo-finance

How can efficiently download a large list of tickers from Yahoo Finance?


I have a really large list of tickers for which I want to download financial data using Yahoo finance.

I am now working with python, but R solutions are also welcomed.

Here is the piece of python code and the excel file with the tickers is also attached:

import yfinance as yf
from yahoofinancials import YahooFinancials

# get tickers 
mf_tickers = pd.read_excel('tickers_mutual_funds.xlsx')

# download data 
tickers = mf_tickers.Symbol
funds_financials = YahooFinancials(tickers)

data = funds_financials.get_historical_price_data(start_date='2019-01-01', 
                                                  end_date='2019-12-31',
                                                  time_interval='weekly')

The problem is that it takes ages,like hours, in my computer to load this size of data.

What is a better and more efficient way to load these data? As said, I am happy with both python or R solutions.

File to load: https://www.dropbox.com/s/1l41tk8gxzqvutd/tickers_mutual_funds%20copy.xlsx?dl=0

Thanks


Solution

  • I'd offer up a package called yahooquery, with the DISCLAIMER: I am the author of the package.

    Here's how you could do it:

    from yahooquery import Ticker
    
    mf_symbols = pd.read_excel('tickers_mutual_funds.xlsx')
    symbols = mf_tickers.Symbol.tolist()
    
    tickers = Ticker(symbols, asynchronous=True)
    
    data = tickers.history(start='2019-01-01', end='2019-12-31', interval='1wk')
    

    For the most part, the history method will return a pandas.DataFrame. However, there's lots of ticker symbols in your list that weren't around in 2019 or just don't have data altogether. So, it will be a dictionary, with each symbol as a key.

    Here's how you can get to a single dataframe:

    # Add symbol column to each dataframe
    for key in data:
        if isinstance(data[key], pd.DataFrame):
            data[key]['symbol'] = key
    
    # Concat all dataframes
    df = pd.concat([data[k] for k in data if isinstance(data[k], pd.DataFrame)])
    

    Here's some stats from when I ran the same function above:

    • Runtime: ~5 minutes
    • Valid Symbols (symbols that returned data): 19,415
    • Total rows in dataframe: 1,008,180
    • Symbols with no data: 6,571