Search code examples
pythonweb-scrapingbeautifulsoupfinanceyahoo-finance

Scrape ticker sector information from Yahoo finance


I got an error message when I try to scrape the "Sector" from a ticker in Yahoo finance. I tried to follow the library manual advising to select the correct parent and child info from the HTLM page, but I could not capture the "sector" of the ticker (for example: for the AAPL ticker the sector is Technology):

Below is the sector's html code (view source): Yahoo website page and html code

Here is my attempt:

from bs4 import BeautifulSoup
import requests

url = 'https://finance.yahoo.com/quote/AAPL/profile?p=AAPL'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
sector_element = soup.find('span', text='Sector(s)').find_next('span', class_='Fw(600)')
print(sector_element.text)

I was expecting to get "Technology" but instead got the following error message:

AttributeError: 'NoneType' object has no attribute 'find_next'

Solution

  • Try to set correct User-Agent HTTP header to get right response from server:

    import requests
    from bs4 import BeautifulSoup
    
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/111.0'}
    
    url = 'https://finance.yahoo.com/quote/AAPL/profile?p=AAPL'
    r = requests.get(url, headers=headers)
    soup = BeautifulSoup(r.content, 'html.parser')
    sector_element = soup.find('span', string='Sector(s)').find_next('span', class_='Fw(600)')
    print(sector_element.text)
    

    Prints:

    Technology