I'm scraping a page and I have to get the number of employees from this format:
<h5>Number of Employees</h5>
<p>
20
</p>
I need to get the number "20" the problem is that this numbers isn't always in the same header, sometimes is in "h4" and there are more ''h5" headers, so I need to find the data that is contained in the header named: "Number of Employees" and the extract the number that is in the contained paragraph
This is the link of the page
Well, the easiest way is to find an element that contains the "Number of Employees"-text, and then simply take the paragraph after that, assuming that the paragraph always follows right after.
Here's a quick and dirty piece of code that does this, and prints the number out:
parent = soup.find("div", id='business-additional-info-text')
for child in parent.children:
if("Number of Employees" in child):
print(child.findNext('p').contents[0].strip())