Search code examples
pythonhtml-parsingbeautifulsoup

BeautifulSoup parent tag


I have some html that I want to extract text from. Here's an example of the html:

<p>TEXT I WANT <i> &#8211; </i></p>

Now, there are, obviously, lots of <p> tags in this document. So, find('p') is not a good way to get at the text I want to extract. However, that <i> tag is the only one in the document. So, I thought I could just find the <i> and then go to the parent.

I've tried:

up = soup.select('p i').parent

and

up = soup.select('i')
print(up.parent)

and I've tried it with .parents, I've tried find_all('i'), find('i')... But I always get:

'list' object has no attribute "parent"

What am I doing wrong?


Solution

  • This works:

    i_tag = soup.find('i')
    my_text = str(i_tag.previousSibling).strip()
    

    output:

    'TEXT I WANT'
    

    As mentioned in other answers, find_all() returns a list, whereas find() returns the first match or None

    If you are unsure about the presence of an i tag you could simply use a try/except block