this question refers to an algorithm we have not been able to find it. The problem is the following:
We have an XML (to be specific an ODT File containing a content.xml) containing a depth-aligned structure:
<root xmlns:text="someuri">
<header>
...
</header>
<body>
<text:span dns:att01="value">
some text
</text:span>
<text:span dns:att02="value">
more text
<text:span dns:att03="value">
even nested structures
</text:span>
</text:span>
</body>
</root>
Please note, that this is a simplified example containing only the necessary details. As you can see, this looks like a 'normal' xml structure with a root containing some text and span nodes. For our application we need to do some processing. As all the span nodes contain other nodes, forming a tree like structure, the target format needs to be converted to have the text nodes breadth-aligned. This is the desired format:
<root xmlns:text="someuri">
<header>
...
</header>
<body>
<text:marker-begin text:name="01" />
some text
<text:marker-end text:name="01" />
<text:marker text:name="01" />
<text:marker-begin text:name="02" />
more text
<text:marker-begin text:name="03" />
even nested structures
<text:marker-end text:name="03" />
<text:marker text:name="03" />
<text:marker-end text:name="02" />
<text:marker text:name="02" />
</body>
</root>
Don't let the indentation irritate you, that all text nodes might have a direct parent except the body node. The marker is used to trigger a certain feature from a third party software. The desired text notes are now surounded by empty elements implicating a marking mechanism. Now, after some verbose preparation, the question itself:
How would you transform structure one into structure two using the default DOM mechanisms available via java. Is this even possible? Would you rather suggest a SAX approach to collect the start and ending elements of a span node? Does an algorithm already exists for this problem? XLST is not possible due to a side-processing chain, that must be done during the process.
We found a solution by using a dirty trick:
we have a breadth-first implementation of a traverser (using the TreeWalker doesn't make any sense here) delegate the desired operation to a processing function:
// local field
Queue queue;
void traverse()
{
queue = new LinkedListed();
queue.add(documentRoot);
queue.add(root);
while (!queue.isEmpty())
{
current = queue.poll();
children = current.getChildNodes();
// the delegate
process(current);
for (int i = 0; i < children.getLength(); i++)
{
child = children.item(i);
switch(child.getNodeType())
{
case Node.ELEMENT_NODE:
case Node.TEXT_NODE:
queue.add(child);
break;
}
} // end iteration over childnodes
}
}
and here is the processing function:
void process(Node node)
{
String name = node.getNodeName();
Map<String, String> attributes = XMLUtil.extractAttributes(node);
// this is basically the current node, but we need to save it as
// extra reference to copy all child elements from it, to further process
// the document tree
Node target = null;
Node next = null;
Node parent = node.getParentNode();
if(name.equals("text:" + TARGET_ELEMENT)) {
// deep copy
target = node.cloneNode(true);
// create the three relevant bookmark nodes
Node bkMrkStart = document.createElement("bookmark-begin");
Node bkMrkEnd = document.createElement("bookmark-end");
Node bkMrkRsd = document.createElement("bookmark");
// insert bookmark start
node.getParentNode().insertBefore(bkMrkStart, node);
// get next sibling or null, if last elment
next = node.getNextSibling();
// insert ending bookmark and 'residue'
parent.insertBefore(bkMrkRsd, next);
parent.insertBefore(bkMrkEnd, bkMrkRsd);
// create new its tag element
AuxiliaryElement nextAux = createAuxiliary(attributes);
// apply generated id to created bookmarks
XMLUtil.setAttribute(
"text:span",
"id-[" + nextAux.getId().getString() + "]",
bkMrkStart, bkMrkEnd, bkMrkRsd);
NodeList children = target.getChildNodes();
int index = 0;
do
{
Node child = children.item(index).cloneNode(true);
// it seems necessary to extra save this element
// for further processing
queue.add(child);
parent.insertBefore(child, bkMrkEnd);
} while(++index < children.getLength());
// delete old element
parent.removeChild(node);
// clear target
target = null;
}
}
It looks like #removeChild or #insertBefore is not being reflected by the traversal. It might be, that this is due to our own implementation of a breadth first traverser. However using this approach as described above gives the desired result.