I need to write a script that takes several XML files and performs some operations basing on their contents. To make it easier to go through all the elements I wanted to merge all files into one XML tree (only in memory). I've tried using appendNode()
method but I've encountered very strange behaviour. Here's the snippet I use to show the problem:
def slurper = new XmlSlurper()
def a = slurper.parseText("<a></a>")
def b = slurper.parseText("<b>foo</b>")
a.appendNode(b)
println XmlUtil.serialize(a)
a."**".each { println (it.name()) }
It outputs:
<?xml version="1.0" encoding="UTF-8"?><a>
<b>foo</b>
</a>
a
Serialized XML is correct but I don't get <b>
from the iterator.
However, if I add this line after appending:
a = slurper.parseText(XmlUtil.serialize(a))
output looks like this:
<?xml version="1.0" encoding="UTF-8"?><a>
<b>foo</b>
</a>
a
b
<b>
is there as I expect it to be.
What am I missing here? Why parsing and serializing again changed the output? I'm new to Groovy so I imagine it can be something obvious, please help me understand why it happens. Or maybe there is a better way to merge XML files?
It happens because XmlSlurper.parse(String text)
returns GPathResult
which is:
Base class for representing lazy evaluated GPath expressions.
And according to Groovy XML processing documentation:
XmlSlurper
evaluates the structure lazily. So if you update the xml you’ll have to evaluate the whole tree again.
That's why you have to re-evalutate XML tree with
a = slurper.parseText(XmlUtil.serialize(a))
to get your expression working.
If you use XmlParser
on the other hand you will get it working without re-evaluation of XML tree, e.g.
import groovy.xml.XmlUtil
XmlParser root = new XmlParser()
def a = root.parseText("<a></a>")
def b = root.parseText("<b>foo</b>")
a.append(b)
println XmlUtil.serialize(a)
a."**".each { println (it.name()) }
<?xml version="1.0" encoding="UTF-8"?><a>
<b>foo</b>
</a>
a
b