Search code examples
beautifulsouphtml-parsing

parsing a html snippret in python with BeautilfulSoup


I need to parse this HTML string using BeautifulSoup. The string is

<address><span rel="v:address"><span dir="ltr"><span class="street-address" property="v:street-address">5015 Campbell Blvd</span>, <span class="locality"><span property="v:locality">Baltimore</span>, <span property="v:region">MD</span> <span property="v:postal-code">21236</span></span> </span></span></address>

I actually want to get the value Baltimore inside the tag <span property="v:locality">.

But somehow when I run the following code I can only reach up to <span class="street-address" property="v:street-address">. How can get the value is the tag <span property="v:locality">

Following is my code.

from bs4 import BeautifulSoup
str = <address><span rel="v:address"><span dir="ltr"><span class="street-address" property="v:street-address">5015 Campbell Blvd</span>, <span class="locality"><span property="v:locality">Baltimore</span>, <span property="v:region">MD</span> <span property="v:postal-code">21236</span></span> </span></span></address>
soup = BeautifulSoup(str)
print(soup.address.span.span.find_all('property'))

output is

[]

Solution

  • >>> from bs4 import BeautifulSoup
    >>> html = '''<address><span rel="v:address"><span dir="ltr"><span class="street-address" property="v:street-address">5015 Campbell Blvd</span>, <span class="locality"><span property="v:locality">Baltimore</span>, <span property="v:region">MD</span> <span property="v:postal-code">21236</span></span> </span></span></address>'''
    >>> soup = BeautifulSoup(html, "lxml")
    >>> target = soup.find_all('span', attrs={'property': 'v:locality'})
    >>> for value in target:
            print(value.text)
    
    Baltimore