Search code examples
pythonparsingbeautifulsouppython-requestshtml-parsing

Why some elements of the response object are missing? Requests module


As I've recently started learning web scraping, I thought I would try to parse an HTML table from this site using requests and bs4 modules.

I know I need to access td class from tbody -- this is how a web page looks like at least:

enter image description here

When I try, though, it doesn't seem to work properly as it only captures td class from thead and not from tbody. Hence, I cannot capture anything but the headers of the table.

I assume it has something to do with requests module.

url = 'https://vstup.edbo.gov.ua/statistics/requests-by-university/? 
qualification=1&education-base=40'
r = requests.get(url)
print(r.text)

The result is as follows (pasting table-related part):

<table id="stats">
    <caption></caption>
    <thead>
    <tr>
        <td class="region">Регіон</td>
        <td class="university">Назва закладу</td>
        <td class="speciality">Спеціальність (спеціалізація)</td>
        <td class="average-ball number" title="Середній конкурсний бал">СКБ</td>
        <td class="requests-total number">Усього заяв</td>
        <td class="requests-budget number">Заяв на бюджет</td>
            </tr>
    </thead>
    <tbody></tbody>
</table>

So the tbody elements are missing in my response object, while they are present in the code of the web page. What am I doing wrong?


Solution

  • @Holdenweb suggested trying Selenium and everything worked.

    from selenium import webdriver 
    from bs4 import BeautifulSoup
    
    url = 'https://vstup.edbo.gov.ua/statistics/requests-by-university/? 
    qualification=1&education-base=40'
    browser = webdriver.Firefox(executable_path=r'D:/folder/geckodriver.exe')
    browser.get(url)
    html = browser.page_source
    

    after that, I used BeautifulSoup and managed to parse the web page.