I have some html that I want to extract text from. Here's an example of the html:
<p>TEXT I WANT <i> – </i></p>
Now, there are, obviously, lots of <p>
tags in this document. So, find('p')
is not a good way to get at the text I want to extract. However, that <i>
tag is the only one in the document. So, I thought I could just find the <i>
and then go to the parent.
I've tried:
up = soup.select('p i').parent
and
up = soup.select('i')
print(up.parent)
and I've tried it with .parents
, I've tried find_all('i')
, find('i')
... But I always get:
'list' object has no attribute "parent"
What am I doing wrong?
This works:
i_tag = soup.find('i')
my_text = str(i_tag.previousSibling).strip()
output:
'TEXT I WANT'
As mentioned in other answers, find_all()
returns a list, whereas find()
returns the first match or None
If you are unsure about the presence of an i tag you could simply use a try/except
block