Search code examples
javahtmltagsjsoupelement

Joining HTML elements with JSoup


Is there any way in JSoup to join two or more elements in memory - i.e., in the Document tree, without producing the raw HTML string?

For example, the following HTML div element with some nested tags

<div>This is text with <custom>a custom nested tag</custom> and some <other>text within a tag</other>, all of which should become part of the top-level </div>.

would be transformed into

<div>This is text with a custom nested tag and some text within a tag, all of which should become part of the top-level </div>.

Essentially, the nested tags in the example above have been deleted but their content has remained, as if a string replace() operation had been run on the raw HTML, before parsed into a Document object by JSoup.

The overall operation could be coded like this:

public static method splice(Document document, List<String> tags) {
  for (String tag : tags) {
    // Find the tag node (Element) in the tree
    // Remove the tag node and join its content with its parent
  }
}

Solution

  • Jsoup's upwrap() function is what you're looking for. It removes the element but keeps the children elements.