Search code examples
python-3.xhtml-parsing

How to parse the only the second span tag in an HTML document using python bs4


I want to parse only one span tag in my html document. There are three sibling span tags without any class or I'd. I am targeting the second one only using BeautifulSoup 4.

Given the following html document:

<div class="adress">
   <span>35456 street</span>
   <span>city, state</span>
   <span>zipcode</span>
</div>

I tried:

for spn in soup.findAll('span'):
    data = spn[1].text

but it didn't work. The expected result is the text in the second span stored in a a variable:

data = "city, state"

and how to to get both the first and second span concatenated in one variable.


Solution

  • You are trying to slice an individual span (a Tag instance). Get rid of the for loop and slice the findAll response instead, i.e.

    >>> soup.findAll('span')[1]
    <span>city, state</span>
    

    You can get the first and second tags together using:

    >>> soup.findAll('span')[:2]
    [<span>35456 street</span>, <span>city, state</span>]
    

    or, as a string:

    >>> "".join([str(tag) for tag in soup.findAll('span')[:2]])
    '<span>35456 street</span><span>city, state</span>'