Search code examples
pythonhtmlcommentsbeautifulsoup

How to find all comments with Beautiful Soup


This question was asked four years ago, but the answer is now out of date for BS4.

I want to delete all comments in my html file using beautiful soup. Since BS4 makes each comment as a special type of navigable string, I thought this code would work:

for comments in soup.find_all('comment'):
     comments.decompose()

So that didn't work.... How do I find all comments using BS4?


Solution

  • You can pass a function to find_all() to help it check whether the string is a Comment.

    For example I have below html:

    <body>
       <!-- Branding and main navigation -->
       <div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div>
       <div class="l-branding">
          <p>Just a brand</p>
       </div>
       <!-- test comment here -->
       <div class="block_content">
          <a href="https://www.google.com">Google</a>
       </div>
    </body>
    

    Code:

    from bs4 import BeautifulSoup as BS
    from bs4 import Comment
    ....
    soup = BS(html, 'html.parser')
    comments = soup.find_all(string=lambda text: isinstance(text, Comment))
    for c in comments:
        print(c)
        print("===========")
        c.extract()
    

    the output would be:

    Branding and main navigation 
    ============
    test comment here
    ============
    

    BTW, I think the reason why find_all('Comment') doesn't work is (from BeautifulSoup document):

    Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names. Text strings will be ignored, as will tags whose names that don’t match.