Search code examples
pythonhtmltagsbeautifulsoupstrip

Strip the first (top level) tag in Beautifulsoup


I create a soup:

from bs4 import BeautifulSoup
soup = BeautifulSoup("<div><p>My paragraph <a>My link</a></p></div>", "html.parser")

I want to strip the first top-level tag to reveal its contents, regardless of the tag:

<p>My paragraph<a>My link</a></p>

with all the children. So I don't want to find and replace by tag like soup.find("div"), but do this positionally.

How can this be done?


Solution

  • Use the provided .unwrap() function:

    from bs4 import BeautifulSoup
    soup = BeautifulSoup("<div><p>My paragraph <a>My link</a></p><p>hello again</p></div>","html.parser")
    
    soup.contents[0].unwrap()
    
    print soup
    print len(soup.contents)
    

    Result:

    <p>My paragraph <a>My link</a></p><p>hello again</p>
    2