How would I loop through HTML Table Rows in Python? Just to let y'all know, I am working on the website: https://schools.texastribune.org/districts/. What I'm trying to do is click each link in the table body (?) and extract the total number of students:
What I have so far:
response = requests.get("https://schools.texastribune.org/districts/")
soup = BeautifulSoup(response.text)
data = []
for a in soup.find_all('a', {'class': 'table table-striped'}):
response = requests.get(a.get('href'))
asoup = BeautifulSoup(response.text)
data.append({
'url': a.get('href'),
'title': a.h2.get_text(strip=True),
'content': asoup.article.get_text(strip=True)
})
pd.DataFrame(data)
This is my first ever time web scraping something.
You should not have class_="td"
when finding the <td>
elements, they don't have any class.
There's no <ul>
elements in the table, so view = match.find('ul',class_="tr")
won't find anything. You need to find the <a>
element, gets its href
, and load that to get the total students.
d = {}
for match in soup.find_all('td'):
link = match.find("a")
if link and link.href:
school_page = requests.get("https://schools.texastribune.org" + link.href)
school_soup = BeautifulSoup(school_page, 'lxml')
total_div = school_soup.find("div", class_="metric", text="Total students"
if total_div:
amount = total_div.find("p", class_="metric-value")
d[link.text] = amount.text
print(d)