How can efficiently download a large list of tickers from Yahoo Finance?

I have a really large list of tickers for which I want to download financial data using Yahoo finance.

I am now working with python, but R solutions are also welcomed.

Here is the piece of python code and the excel file with the tickers is also attached:

import yfinance as yf
from yahoofinancials import YahooFinancials

# get tickers 
mf_tickers = pd.read_excel('tickers_mutual_funds.xlsx')

# download data 
tickers = mf_tickers.Symbol
funds_financials = YahooFinancials(tickers)

data = funds_financials.get_historical_price_data(start_date='2019-01-01', 
                                                  end_date='2019-12-31',
                                                  time_interval='weekly')

The problem is that it takes ages,like hours, in my computer to load this size of data.

What is a better and more efficient way to load these data? As said, I am happy with both python or R solutions.

File to load: https://www.dropbox.com/s/1l41tk8gxzqvutd/tickers_mutual_funds%20copy.xlsx?dl=0

Thanks

Solution

I'd offer up a package called yahooquery, with the DISCLAIMER: I am the author of the package.

Here's how you could do it:

from yahooquery import Ticker

mf_symbols = pd.read_excel('tickers_mutual_funds.xlsx')
symbols = mf_tickers.Symbol.tolist()

tickers = Ticker(symbols, asynchronous=True)

data = tickers.history(start='2019-01-01', end='2019-12-31', interval='1wk')

For the most part, the history method will return a pandas.DataFrame. However, there's lots of ticker symbols in your list that weren't around in 2019 or just don't have data altogether. So, it will be a dictionary, with each symbol as a key.

Here's how you can get to a single dataframe:

# Add symbol column to each dataframe
for key in data:
    if isinstance(data[key], pd.DataFrame):
        data[key]['symbol'] = key

# Concat all dataframes
df = pd.concat([data[k] for k in data if isinstance(data[k], pd.DataFrame)])

Here's some stats from when I ran the same function above:

Runtime: ~5 minutes
Valid Symbols (symbols that returned data): 19,415
Total rows in dataframe: 1,008,180
Symbols with no data: 6,571