Search code examples
web-scrapingyahoo-finance

Python code to scrape ticker symbols from Yahoo finance


I have a list of >1.000 companies which I could use to invest in. I need the ticker symbol id's from all these companies. I find difficulties when I am trying to strip the output of the soup, and when I am trying to loop through all the company names.

Please see an example of the site: https://finance.yahoo.com/lookup?s=asml. The idea is to replace asml and put 'https://finance.yahoo.com/lookup?s='+ Companies., so I can loop through all the companies.

companies=df    
        Company name
    0   Abbott Laboratories
    1   ABBVIE
    2   Abercrombie
    3   Abiomed
    4   Accenture Plc

This is the code I have now, where the strip code doesn't work, and where the loop for all the company isn't working as well.

#Create a function to scrape the data
def scrape_stock_symbols():
  Companies=df
  url= 'https://finance.yahoo.com/lookup?s='+ Companies
  page= requests.get(url)

  soup = BeautifulSoup(page.text, "html.parser")
  Company_Symbol=Soup.find_all('td',attrs ={'class':'data-col0 Ta(start) Pstart(6px) Pend(15px)'})

  for i in company_symbol:
       try:
       row = i.find_all('td')
       company_symbol.append(row[0].text.strip())
    
     except Exception: 
      if company not in company_symbol:
        next(Company)

  return (company_symbol)

#Loop through every company in companies to get all of the tickers from the website
for Company in companies:
  try:
    (temp_company_symbol) = scrape_stock_symbols(company)

  except Exception: 
    if company not in companies:
        next(Company)

Another difficulty is that the symbol look up from yahoo finance will retrieve many companies names. I will have to clear the data afterwards. I want to set the AMS exchange as the standard, hence if a company is listed on multiple exchanges, I am only interested in the AMS ticker symbol. The final goal is to create a new dataframe:

    Comapny name           Company_symbol
0   Abbott Laboratories    ABT
1   ABBVIE                 ABBV  
2   Abercrombie            ANF

Solution

  • Here's a solution that doesn't require any scraping. It uses a package called yahooquery (disclaimer: I'm the author), which utilizes an API endpoint that returns symbols for a user's query. You can do something like this:

    import pandas as pd
    import yahooquery as yq
    
    def get_symbol(query, preferred_exchange='AMS'):
        try:
            data = yq.search(query)
        except ValueError: # Will catch JSONDecodeError
            print(query)
        else:
            quotes = data['quotes']
            if len(quotes) == 0:
                return 'No Symbol Found'
    
            symbol = quotes[0]['symbol']
            for quote in quotes:
                if quote['exchange'] == preferred_exchange:
                    symbol = quote['symbol']
                    break
            return symbol
    
    companies = ['Abbott Laboratories', 'ABBVIE', 'Abercrombie', 'Abiomed', 'Accenture Plc']
    df = pd.DataFrame({'Company name': companies})
    df['Company symbol'] = df.apply(lambda x: get_symbol(x['Company name']), axis=1)
    
    
              Company name Company symbol
    0  Abbott Laboratories            ABT
    1               ABBVIE           ABBV
    2          Abercrombie            ANF
    3              Abiomed           ABMD
    4        Accenture Plc            ACN