Search code examples
web-scrapingbeautifulsoupscreen-scrapingurllib2python-requests

How to only get inner text of a tag in BeautifulSoup, excluding the embedded one?


For example,

<ul>
    <li>
        <b>Hey, sexy!</b>
        Hello
    </li>
</ul>

I want only 'Hello' from the li tag.

If I use soup.find("ul").li.text It includes the b tag as well.


Solution

  • You could use the find function like so

    from bs4 import BeautifulSoup
    
    html = '''<ul><li><b>Hey, sexy!</b>Hello</li></ul>'''
    soup = BeautifulSoup(html)
    print soup.find('li').find(text=True, recursive=False)