I am a bit confused in getting content of the paragraph tag.
<div class="SomeID">
<p>What a voice! </p>
</div>
I reached at this point
list = soup.find_all("div","SomeID")
But how to get the paragraph content.(What a voice!)
The basic problem is to get the content of all paragraph tags from
import urllib
from bs4 import BeautifulSoup
html = urllib.urlopen('http://www.dawn.com/news/1267272/democracys-woes').read()
soup = BeautifulSoup(html, 'html.parser')
list = soup.find_all("div","comment__body cf")
print list
You can actually do it one go with a CSS selector:
for p in soup.select("div.SomeID > p"):
print(p.get_text(strip=True))
Or, if you need a single p
element:
soup.select_one("div.SomeID > p").get_text(strip=True)
Note that >
here means the direct parent-child relationship.