Search code examples
xmlgroovy

Groovy : Deleting all the XML element where tag value has alphabet or special character


I have written a code which is not working as expected. If anyone can tell me what i am doing wrong.

Input

    
<DLF>
    <DeliveryOrder>
        <COMMETTANT>3260</COMMETTANT>
        <COMPTE/>
        <REFCDEEXTERNE>BIS'ART-17.04.23</REFCDEEXTERNE>
        <ETATCOMMANDE>EXP</ETATCOMMANDE>
        <DATEETATDECOMMANDE>240420230000</DATEETATDECOMMANDE>
    </DeliveryOrder>
    <DeliveryOrder>
        <COMMETTANT>3260</COMMETTANT>
        <COMPTE/>
        <REFCDEEXTERNE>WEB230415_33191</REFCDEEXTERNE>
        <ETATCOMMANDE>EXP</ETATCOMMANDE>
        <DATEETATDECOMMANDE>190420230940</DATEETATDECOMMANDE>
    </DeliveryOrder>
    <DeliveryOrder>
        <COMMETTANT>3260</COMMETTANT>
        <COMPTE/>
        <REFCDEEXTERNE>23041533191</REFCDEEXTERNE>
        <ETATCOMMANDE>EXP</ETATCOMMANDE>
        <DATEETATDECOMMANDE>190420230940</DATEETATDECOMMANDE>
    </DeliveryOrder>

</DLF>


Requirement : Remove all the Deliveryorder where REFCDEEXTERNE is only number. Groovy Script

import java.util.HashMap;
import groovy.xml.XmlUtil;
import groovy.xml.StreamingMarkupBuilder;
import groovy.xml.*;
import org.apache.camel.converter.stream.InputStreamCache;
def Message processData(Message message) {

    def body = message.getBody();

    if (body instanceof InputStreamCache) {
        // Convert InputStreamCache to a String
        body = body.getText();
    }

 /*   def list = new XmlParser().parseText(body)
    def nodeToDel = list.DLF.DeliveryOrder.find { it.REFCDEEXTERNE.text() =~ /[^\d]/ }
    def parent = nodeToDel.parent()
    parent.remove(nodeToDel)
    def valid_data=XmlUtil.serialize(list)
    message.setBody(valid_data);*/
    
    def list = new XmlParser().parseText(body)
    //list.DLF.DeliveryOrder.removeAll { it.REFCDEEXTERNE.text() =~ /[^0-9]/ }
   //list.DLF.DeliveryOrder.removeAll { it.REFCDEEXTERNE.text() =~ /[a-zA-Z]/ };
    list.DLF.DeliveryOrder.removeAll { it.REFCDEEXTERNE.text() =~ /[^0-9]/ };
   
    def validData = XmlUtil.serialize(list);
    message.setBody(validData);
    
    return message;
}

Output received after executing the code

<?xml version="1.0" encoding="UTF-8"?><Root>
  <DLF>
    <DeliveryOrder>
      <COMMETTANT>3260</COMMETTANT>
      <COMPTE/>
      <REFCDEEXTERNE>BIS'ART-17.04.23</REFCDEEXTERNE>
      <ETATCOMMANDE>EXP</ETATCOMMANDE>
      <DATEETATDECOMMANDE>240420230000</DATEETATDECOMMANDE>
    </DeliveryOrder>
    <DeliveryOrder>
      <COMMETTANT>3260</COMMETTANT>
      <COMPTE/>
      <REFCDEEXTERNE>WEB230415_33191</REFCDEEXTERNE>
      <ETATCOMMANDE>EXP</ETATCOMMANDE>
      <DATEETATDECOMMANDE>190420230940</DATEETATDECOMMANDE>
    </DeliveryOrder>
    <DeliveryOrder>
        <COMMETTANT>3260</COMMETTANT>
        <COMPTE/>
        <REFCDEEXTERNE>23041533191</REFCDEEXTERNE>
        <ETATCOMMANDE>EXP</ETATCOMMANDE>
        <DATEETATDECOMMANDE>190420230940</DATEETATDECOMMANDE>
    </DeliveryOrder>

  </DLF>

Expected Output

<DLF>
    <DeliveryOrder>
        <COMMETTANT>3260</COMMETTANT>
        <COMPTE/>
        <REFCDEEXTERNE>23041533191</REFCDEEXTERNE>
        <ETATCOMMANDE>EXP</ETATCOMMANDE>
        <DATEETATDECOMMANDE>190420230940</DATEETATDECOMMANDE>
    </DeliveryOrder>

  </DLF>

According to me the output shouldn't have any DeliveryOrder node

Please guide


Solution

  • So, you need to parse the XML

    Then find the nodes to remove

    Then for each of them, remove them from their parent

    And convert the xml document back to a String

    This should do it (comments added to explain each section):

    import groovy.xml.*
    
    // Parse the XML text
    def xml = new XmlParser().parseText(body)
    
    // Find all nodes which have REFCDEEXTERNE containing a non-number
    xml.DeliveryOrder.findAll {
        def field = it.REFCDEEXTERNE.text()
        def matches = field ==~ /.*[^0-9].*/
        matches
    }.each { 
        // And then remove each of them from the document
        xml.remove(it)
    }
    
    // Then convert it back to a String
    StringWriter stringWriter = new StringWriter()
    XmlNodePrinter nodePrinter = new XmlNodePrinter(new PrintWriter(stringWriter))
    nodePrinter.setPreserveWhitespace(true)
    nodePrinter.print(xml)
    def newBody = stringWriter.toString()
    
    // And print it out
    println newBody