Search code examples
pythonpython-2.7web-scrapinghtml-parsing

Scraping a website with clickable content in Python


I would like to scrap the content a the following website:

http://financials.morningstar.com/ratios/r.html?t=AMD

In there under Key Ratios I would like to click on "Growth" button and then scrap the data in Python.

How can I do that?


Solution

  • You can solve it with requests+BeautifulSoup. There is an asynchronous GET request sent to the http://financials.morningstar.com/financials/getKeyStatPart.html endpoint which you need to simulate. The Growth table is located inside the div with id="tab-growth":

    from bs4 import BeautifulSoup
    import requests
    
    
    url = 'http://financials.morningstar.com/ratios/r.html?t=AMD'
    keystat_url = 'http://financials.morningstar.com/financials/getKeyStatPart.html'
    
    with requests.Session() as session:
        session.headers = {'User-Agent': 'Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30'}
    
        # visit the target url
        session.get(url)
    
        params = {
            'callback': '',
            't': 'XNAS:AMD',
            'region': 'usa',
            'culture': 'en-US',
            'cur': '',
            'order': 'asc',
            '_': '1426047023943'
        }
        response = session.get(keystat_url, params=params)
    
        # get the HTML part from the JSON response
        soup = BeautifulSoup(response.json()['componentData'])
    
        # grab the data
        for row in soup.select('div#tab-growth table tr'):
            print row.text