I want to scrape information from this page.
Specifically, I want to scrape the table which appears when you click "View all" under the "TOP 10 HOLDINGS" (you have to scroll down on the page a bit).
I am new to webscraping, and have tried using BeautifulSoup to do this. However, there seems to be an issue because the "onclick" function I need to take into account. In other words: The HTML code I scrape directly from the page doesn't include the table I want to obtain.
I am a bit confused about my next step: should I use something like selenium or can I deal with the issue in an easier/more efficient way?
Thanks.
My current code:
from bs4 import BeautifulSoup
import requests
Soup = BeautifulSoup
my_url = 'http://www.etf.com/SHE'
page = requests.get(my_url)
htmltxt = page.text
soup = Soup(htmltxt, "html.parser")
print(soup)
You can get a json response from the api: http://www.etf.com/view_all/holdings/SHE
. The table you're looking for is located in 'view_all'
.
import requests
from bs4 import BeautifulSoup as Soup
url = 'http://www.etf.com/SHE'
api = "http://www.etf.com/view_all/holdings/SHE"
headers = {'X-Requested-With':'XMLHttpRequest', 'Referer':url}
page = requests.get(api, headers=headers)
htmltxt = page.json()['view_all']
soup = Soup(htmltxt, "html.parser")
data = [[td.text for td in tr.find_all('td')] for tr in soup.find_all('tr')]
print('\n'.join(': '.join(row) for row in data))