Search code examples
htmljsoup

JSoup - remove tags with particular word in them (and everything within the tags)


I have a webpage that has html tags of the form

<section class="feature-authorized-retailer pdp-outofstock-js hide">
    <div class="retailer-notification">
        <span>This product is out of stock</span>
    </div>

            <section id="marketing-product-actions" class="product-content-form-marketing-product-actions product-actions">
                <div class="product-content-form-product-actions-primary product-actions-primary">
                    <a class="product-content-form-out-of-stock button secondary">Out of Stock</a>
                </div>
            </section>

</section>

As you can see in the outer "section" tag it has the word "hide" in the class name. Is there a way to identify tags like these with the word "hide" in the class name using JSoup such that I could remove them and all html within these tags?


Solution

  • To select elements using Jsoup you can use most CSS Selectors.

    1. Select all elements with class hide:
    document.select(".hide")
    

    Element may contain many classes but this will match if one of them equals hide.

    It will match class="abc hide abc" but won't match class="abc abchideabc abc".

    1. Select all elements where value of attribute class contains string hide;
    document.select("[class~=hide]")
    

    This one will match class="abc hide abc" but it will also match class="abc abchideabc abc"

    To remove selected elements use document.select(...).remove()