Search code examples
javascriptpythonweb-scrapingpython-requests-html

Python Web-scraping, How to click 'Next' using Requests-HTML library


I'm trying to get the data from "https://fortune.com/global500/2019/search/" using python requests-html module. I'm able to get the 1st 100 items (from 1st page) because the page have javascript enabled. And we need to click on "next" to load the 2nd page, curretly i get only just the 1st 100 items.

While i click "next" on the browser the url is not changing on the address bar. So I'm clueless how to get the next pages using requests-html.

from requests_html import HTMLSession

def get_fortune500():
    companies = []
    url = 'https://fortune.com/global500/2019/search/'
    session = HTMLSession()
    r = session.get(url)
    r.html.render(wait=1, retries=2)
    table = r.html.find('div.rt-tbody', first=True)
    rows = table.find('div.rt-tr-group')
    for row in rows:
        row_data = []
        cells = row.find('div.rt-td')
        for cell in cells:
            celldata = cell.text.lstrip('$').replace(',', '')
            row_data.append(celldata)
        companies.append(row_data)
    return companies

fortune_list = get_fortune500()
print(fortune_list)
print(len(fortune_list))

I really appreciate your time.


Solution

  • Here is the list of 500 of all

    https://content.fortune.com/wp-json/irving/v1/data/franchise-search-results?list_id=2666483

    This website is storing the response of this API in browsers IndexedDB and after that only frontend takes control.

    You can figure out the way to read That response from the first request.