Search code examples
pythonpython-2.7beautifulsouppydev

Get Paragraph Content


I am a bit confused in getting content of the paragraph tag.

<div class="SomeID">
<p>What a voice! </p>
</div>

I reached at this point

list = soup.find_all("div","SomeID")

But how to get the paragraph content.(What a voice!)

The basic problem is to get the content of all paragraph tags from

import urllib
from bs4 import BeautifulSoup

html = urllib.urlopen('http://www.dawn.com/news/1267272/democracys-woes').read()
soup = BeautifulSoup(html, 'html.parser')
list = soup.find_all("div","comment__body cf")
print list

Solution

  • You can actually do it one go with a CSS selector:

    for p in soup.select("div.SomeID > p"):
        print(p.get_text(strip=True))
    

    Or, if you need a single p element:

    soup.select_one("div.SomeID > p").get_text(strip=True)
    

    Note that > here means the direct parent-child relationship.