Search code examples
pythonhtmlselenium-webdriverweb-scrapingbeautifulsoup

Scraping a table appearing on click with python


I want to scrape information from this page.

Specifically, I want to scrape the table which appears when you click "View all" under the "TOP 10 HOLDINGS" (you have to scroll down on the page a bit).

I am new to webscraping, and have tried using BeautifulSoup to do this. However, there seems to be an issue because the "onclick" function I need to take into account. In other words: The HTML code I scrape directly from the page doesn't include the table I want to obtain.

I am a bit confused about my next step: should I use something like selenium or can I deal with the issue in an easier/more efficient way?

Thanks.

My current code:

from bs4 import BeautifulSoup
import requests


Soup = BeautifulSoup
my_url = 'http://www.etf.com/SHE'
page = requests.get(my_url)
htmltxt = page.text

soup = Soup(htmltxt, "html.parser")
print(soup)

Solution

  • You can get a json response from the api: http://www.etf.com/view_all/holdings/SHE. The table you're looking for is located in 'view_all'.

    import requests
    from bs4 import BeautifulSoup as Soup
    
    url = 'http://www.etf.com/SHE'
    api = "http://www.etf.com/view_all/holdings/SHE"
    headers = {'X-Requested-With':'XMLHttpRequest', 'Referer':url}
    page = requests.get(api, headers=headers)
    htmltxt = page.json()['view_all']
    soup = Soup(htmltxt, "html.parser")
    data = [[td.text for td in tr.find_all('td')] for tr in soup.find_all('tr')]
    
    print('\n'.join(': '.join(row) for row in data))