Search code examples
javajsoup

Removing only an html tag and leaving behind the text inside the tag using Jsoup


Just want to remove only the inner tag "span" and don't want to remove the text inside it

<blockquote>
      <span>I don’t even bring up technology.</span> 
          I talk about the flow of data.&rdquo;
      <cite>&ndash;Rick Hassman, CIO, Pella</cite>
</blockquote>

After parsing it should look like

    <blockquote>
            I don’t even bring up technology.
              I talk about the flow of data.&rdquo;
          <cite>&ndash;Rick Hassman, CIO, Pella</cite>
    </blockquote>

Please help..


Solution

  • The simplest way to solve it would be to use String.replace() method.

    String newHtml = html.replaceAll( "<\\/?\\s*span.*?>", "");
    

    If you prefer to use Jsoup, then it gets more complicated:

            Document doc = Jsoup.parse(html);
            for (Element e : doc.select("span")) {
    
                Element parent = e.parent();
                Element newParent = parent.clone();
                newParent.empty();
                for (Node n : parent.childNodes()) {
    
                    if (n instanceof Element && ((Element) n).tag().getName().equals("span")) {
                        newParent.append(((Element) n).html());
                    } else {
                        newParent.append(n.outerHtml());
                    }
    
                }
                parent.replaceWith(newParent);
    
            }