As I've recently started learning web scraping, I thought I would try to parse an HTML table from this site using requests and bs4 modules.
I know I need to access td class
from tbody
-- this is how a web page looks like at least:
When I try, though, it doesn't seem to work properly as it only captures td class
from thead
and not from tbody
. Hence, I cannot capture anything but the headers of the table.
I assume it has something to do with requests
module.
url = 'https://vstup.edbo.gov.ua/statistics/requests-by-university/?
qualification=1&education-base=40'
r = requests.get(url)
print(r.text)
The result is as follows (pasting table-related part):
<table id="stats">
<caption></caption>
<thead>
<tr>
<td class="region">Регіон</td>
<td class="university">Назва закладу</td>
<td class="speciality">Спеціальність (спеціалізація)</td>
<td class="average-ball number" title="Середній конкурсний бал">СКБ</td>
<td class="requests-total number">Усього заяв</td>
<td class="requests-budget number">Заяв на бюджет</td>
</tr>
</thead>
<tbody></tbody>
</table>
So the tbody
elements are missing in my response object, while they are present in the code of the web page. What am I doing wrong?
@Holdenweb suggested trying Selenium and everything worked.
from selenium import webdriver
from bs4 import BeautifulSoup
url = 'https://vstup.edbo.gov.ua/statistics/requests-by-university/?
qualification=1&education-base=40'
browser = webdriver.Firefox(executable_path=r'D:/folder/geckodriver.exe')
browser.get(url)
html = browser.page_source
after that, I used BeautifulSoup and managed to parse the web page.