Search code examples
pythonhtmlcsspython-2.7beautifulsoup

Remove portion of html (tag) keeping style - python


I would like to remove a portion of an HTML that contains a specific string before saving it. The tag contains a person's Name and I would like to remove the entire tag to make it anonymous.

The HTML is:

<div id="top-card" data-li-template="top_card">...</div>

and all its children.

I explored using beautifulsoup but could not find a solution.

Is there a way that I can just remove the entire portion of the HTML while keeping the style intact?

Thanks!


Solution

  • You can use .extract()to remove elements from using BeautifulSoup.

    Assuming you want to remove the div whose id is "top-card":

    >>> html = """
    ... <div id="top-card" data-li-template="top_card"><div>test</div></div>
    ... <div>test</div> <div id="foo">blah</div>"""
    >>> soup = BeautifulSoup(html)
    >>> [div.extract() for div in soup("div",id="top-card")]
    [<div data-li-template="top_card" id="top-card"><div>test</div></div>]
    >>> soup
    <html><body>
    <div>test</div> <div id="foo">blah</div></body></html>