Search code examples
javahtmljsoup

how to remove tags with jsoup but keep given tags


How to remove all tags except <p> and <img> with jsoup?

<div>
  <p>hello world
    <span>good</span>
    <img src="/src/img/beauty.jpg"/>
    welcome
  </p>
</div>

Should become

<p>hello world
    good
    <img src="/src/img/beauty.jpg"/>
    welcome
  </p>

Solution

  • You're going to want to look at the Cleaner.clean() method. You'll specify a Whitelist of tags you want to allow.

    Example from jsoup.org:

    String unsafe = 
        "<p><a href='http://example.com/' onclick='stealCookies()'>Link</a></p>";
    String safe = Jsoup.clean(unsafe, Whitelist.basic());
        // now: <p><a href="http://example.com/" rel="nofollow">Link</a></p>