I tried to webscrape the data from the below url to get the data from the "Growth Estimates" table using beautiful soup & requests but it can't seem to pick the table up. However when using the inspection tool I can see there is a table there to pull data from and I couldn't see anything about it being pulled dynamically, but I could be wrong.
url = https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL
Is someone able to explain the issue and offer a solution?
Thank you!
import requests
from bs4 import BeautifulSoup
def get_growth_data(symbol):
url = "https://finance.yahoo.com/quote/{symbol}/analysis?p={symbol}"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
# Find the table containing the growth data
table = soup.find("table", class_="W(100%) M(0) BdB Bdc($seperatorColor) Mb(25px)")
if table is None:
print("Table not found.")
return []
# Extract the growth values from the table
growth_values = []
rows = table.find_all("tr")
for row in rows:
columns = row.find_all("td")
if len(columns) >= 2:
growth_values.append(columns[1].text)
return growth_values
symbol = 'AAPL'
growth_data = get_growth_data(symbol)
print(growth_data)
To get correct response from the server set User-Agent
HTTP header in your request:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/113.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
table = soup.select_one('table:-soup-contains("Growth Estimates")')
df = pd.read_html(str(table))[0]
print(df)
Prints:
Growth Estimates AAPL Industry Sector(s) S&P 500
0 Current Qtr. -0.80% NaN NaN NaN
1 Next Qtr. 5.40% NaN NaN NaN
2 Current Year -2.30% NaN NaN NaN
3 Next Year 9.90% NaN NaN NaN
4 Next 5 Years (per annum) 8.02% NaN NaN NaN
5 Past 5 Years (per annum) 23.64% NaN NaN NaN