I have a span element with the code like below, how could I extract the text only exist outside the anchor(a) tag:
# print soup.prettify()
<span class="1">
text_wanted
<a data-toggle="notify" href="https://www.abc.com/1" class="class1"><span>text1</span></a>
<a data-toggle="notify" href="https://www.abc.com/2" class="class2"><span>text2</span></a>
</span>
I am thinking about the solution below:
text_all = soup.text.encode('utf-8')
text_strip_list = [a.text.encode('utf-8').strip() for a in soup.find_all('a')]
for text_strip in text_strip_list:
text_all = text_all.replace(text_strip, '').strip()
I am wondering is there an easy way to get the text wanted instead of diving into the anchor tag..
Thanks in advance...
Assuming html
is the BeautifulSoup object with the parsed HTML,
from BeautifulSoup import NavigableString
print [node for node in html.find('span').contents if type(node) is NavigableString]
will yield the text nodes inside the outermost span
.