Search code examples
pythonrequestinnerhtml

How to webscrape changing HTML?


I am learning webscraping, and wondering how to scrape a website if you want the changed HTML after you press a button. I am able to do it with Selenium, but that is slow. How would one do it with requests?

EX: I start at https://www.collegeswimming.com/swimmer/356597/ and I want to scrape the table that appears when you click the button on the page “Fastest.” Note that in the HTML source the table does not exist until you press “Fastest,” and once you press “Fastest” the url is unchanged and still “https://www.collegeswimming.com/swimmer/356597/“.

I used inspect element, and then under “network” looked at the request that is made when I click on the button “Fastest”. The request is “https://www.collegeswimming.com/swimmer/356597/times/fastest/“. Notice that it is not possible to navigate to this by itself, as it just leads to the original https://www.collegeswimming.com/swimmer/356597. I tried using requests as so:

import requests
r=requests.get(“https://www.collegeswimming.com/swimmer/356597/times/fastest”)
print(r.text)
print(r.content)
r.json()

Sadly, none of these work. The response I am looking for is the response that is shown after clicking “fastest” that can be viewed by “inspect element -> Network -> fastest/ -> response”, but the response I get using the above code is just the html for the original page “ https://www.collegeswimming.com/swimmer/356597

I would greatly appreciate some help!


Solution

  • If you look at the request headers on the request on the Network tab, you should see the X-Requested-With header with a value of XMLHttpRequest, indicating it's an AJAX call. You can add this request header on your request like this:

    url = "https://www.collegeswimming.com/swimmer/356597/times/fastest/"
    r = requests.get(url, headers={"X-Requested-With": "XMLHttpRequest"})
    print(r.text)