Search code examples
xmlgroovyxmlslurper

Groovy: merging two XML files with GPathResult.appendNode(node) does not work


I need to write a script that takes several XML files and performs some operations basing on their contents. To make it easier to go through all the elements I wanted to merge all files into one XML tree (only in memory). I've tried using appendNode() method but I've encountered very strange behaviour. Here's the snippet I use to show the problem:

def slurper = new XmlSlurper()
def a = slurper.parseText("<a></a>")
def b = slurper.parseText("<b>foo</b>")
a.appendNode(b)

println XmlUtil.serialize(a)

a."**".each { println (it.name()) }

It outputs:

<?xml version="1.0" encoding="UTF-8"?><a>
  <b>foo</b>
</a>

a

Serialized XML is correct but I don't get <b> from the iterator.

However, if I add this line after appending:

a = slurper.parseText(XmlUtil.serialize(a))

output looks like this:

<?xml version="1.0" encoding="UTF-8"?><a>
  <b>foo</b>
</a>

a
b

<b> is there as I expect it to be.

What am I missing here? Why parsing and serializing again changed the output? I'm new to Groovy so I imagine it can be something obvious, please help me understand why it happens. Or maybe there is a better way to merge XML files?


Solution

  • It happens because XmlSlurper.parse(String text) returns GPathResult which is:

    Base class for representing lazy evaluated GPath expressions.

    And according to Groovy XML processing documentation:

    XmlSlurper evaluates the structure lazily. So if you update the xml you’ll have to evaluate the whole tree again.

    That's why you have to re-evalutate XML tree with

    a = slurper.parseText(XmlUtil.serialize(a))
    

    to get your expression working.

    If you use XmlParser on the other hand you will get it working without re-evaluation of XML tree, e.g.

    import groovy.xml.XmlUtil
    
    XmlParser root = new XmlParser()
    def a = root.parseText("<a></a>")
    def b = root.parseText("<b>foo</b>")
    
    a.append(b)
    
    println XmlUtil.serialize(a)
    
    a."**".each { println (it.name()) }
    

    Output

    <?xml version="1.0" encoding="UTF-8"?><a>
      <b>foo</b>
    </a>
    
    a
    b