Search code examples
xmlgroovyxml-parsingxmlslurper

Groovy XmlSlurper parse mixed text and nodes


I'm currently trying to parse a node in groovy which contains mixed text and nodes with text and I need to get the text in the right order for example:

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <p>
      The text has
      <strong>nodes</strong>
      which need to get parsed
   </p>
</root>

Now I want it to parse so I get the whole text but can still edit the node. In this example I want the result:

The text has <b>nodes</b> which need to get parsed

If I could just get a list of all elements under the p where I can test if its a node or text I would be happy, but I cant find any way to get that.


Solution

  • ok, I found a solution I can use without any (tricky) workarounds. The thing ist, a NodeChild doesn't have a Method that gives you both child text and child nodes but a Node does. To get one simply use childNodes() (because the slurper gives you a NodeChild)

    def root = new XmlSlurper().parse(xml)
    
        root.childNodes().each { target ->
    
            for (s in target.children()) {
    
                if (s instanceof groovy.util.slurpersupport.Node) {
                    println "Node: "+ s.text()
                } else {
                    println "Text: "+ s
                }
            }
        }
    

    This gives me the result:

    Text: The text has
    Node: nodes
    Text: which need to get parsed
    

    Which means I can easily do whatever I want with my Nodes while they are still in the right order with the text