Search code examples
pythonhtmlbeautifulsoupyahoo-finance

Python scraping from Yahoo finance error: 'NoneType' object has no attribute 'parent'


I'm trying to scrape data from income statements on Yahoo Finance using Python.

I would like to extract Net Income that´s in enclosed in:

Yahoo_Finance_Screen

import re, requests
from bs4 import BeautifulSoup

url = 'https://finance.yahoo.com/q/is?s=AAPL&annual'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
pattern = re.compile('Net Income')

title = soup.find('strong', text=pattern)
row = title.parent.parent 
cells = row.find_all('td')[1:] #exclude the <td> with 'Net Income'

values = [ c.text.strip() for c in cells ]

But I´m getting this error:

error_console

Do you know what can be causing the issue?


Solution

  • You can obtain the Net Income values by searching for the 'div' tag instead. This should do the trick:

    import re, requests
    from bs4 import BeautifulSoup
    
    url = 'https://finance.yahoo.com/q/is?s=AAPL&annual'
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    
    title = soup.find('div', string=re.compile('Net Income'))
    row = title.parent.parent 
    values = [i.text for i in row]
    print(values[1:])
    

    Result:

    ['57,215,000', '55,256,000', '59,531,000', '48,351,000', '45,687,000']