Search code examples
python-3.xweb-scrapingpython-requestsxmlhttprequest

Why cant I access full response code of a website?


I would like to check this website periodically if there is an opening to give me a warning in case of an opening. There is no client-side HTML request (XHR) from an API to utilize hence, I decided to scrape the website however, I cannot see the parts of the HTML code in the response I get from my request. Here is the part of the website Im interested in: enter image description here

Then, I made this get request to scrape it with BS.

import requests
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36', 
    'From': '[email protected]'
}
url = 'https://service.berlin.de/terminvereinbarung/termin/day/'
cd = { 'sessionid': '123..'}
r = requests.get(url,headers=headers,cookies=cd)
r.content

However, in the response, none of the classes that are part of that timetable are present. Is there way to get the full html and then scrape it somehow?


Solution

  • The reason you can't see those tables is because they aren't part of the static webpage. Most modern websites load their content via client side javascript (which runs whenever you open the page), rather than server side (runs before it sends the html to the browser). This means that when you get the request with the request library, you're only getting the HTML that the server sent, not the HTML after all the javascript has executed.

    The solution to this problem is to load the webpage in an actual browser, rather than just grabbing the html. This allows the Javascript to load the content before you scrape the site.

    I suggest you checkout Selenium, which is a library which lets you programmatically control a browser. Using this browser you can navigate to your website, wait for the content to load, then scrape it, all from Python. You can find the documentation here: https://selenium-python.readthedocs.io/