Search code examples
javaxmldomxpathxml-namespaces

How can I get a specific node from XML in Java and then remove it?


I have an XML file that I need to navigate and it's something like this (full XML is here):

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<md:EntitiesDescriptor xmlns:md="urn:oasis:names:tc:SAML:2.0:metadata">
    <md:EntityDescriptor xmlns:saml2="urn:oasis:names:tc:SAML:2.0:assertion" xmlns:saml2p="urn:oasis:names:tc:SAML:2.0:protocol" xmlns:xsi="http://www.w3.org/2001/XMLSchemainstance" ID="_id-83bbfdd3-e4c4-42cf-a024-e4733569a4ae" entityID="https://id.eht.eu">
        <md:Organization>
            <md:OrganizationName xml:lang="it">EtnaHitech</md:OrganizationName>  
            <md:OrganizationName xml:lang="en">EtnaHitech</md:OrganizationName>   
            <md:OrganizationDisplayName xml:lang="it">EHT</md:OrganizationDisplayName>     
            <md:OrganizationDisplayName xml:lang="en">EHT</md:OrganizationDisplayName>
        </md:Organization>
    </md:EntityDescriptor>
    <md:EntityDescriptor xmlns:saml2="urn:oasis:names:tc:SAML:2.0:assertion" xmlns:xs="http://www.w3.org/2001/XMLSchemainstance" ID="_gh3s48d19e23e85be40k4ab5ey331e7k4f04f73fb5" entityID="https://id.lepida.it/idp/shibboleth">
        <md:Organization>
            <md:OrganizationName xml:lang="it">Lepida</md:OrganizationName> 
            <md:OrganizationDisplayName xml:lang="it">Lepida</md:OrganizationDisplayName>   
        </md:Organization>
    </md:EntityDescriptor>
</md:EntitiesDescriptor>

Let's say I want to get the node/element md:EntityDescriptor with a specific value of the attribute entityID, like for example entityID="https://id.eht.eu"

I tried to use XPath with Java, and this is my code:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("C:/outputTest/input.xml");

XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("//md:EntityDescriptor[@entityID='https://id.eht.eu']");
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);

And when I try to cycle on the result:

if (nl != null && nl.getLength() != 0) {
    for (int i = 0; i < nl.getLength(); i++) {
        System.out.println(nl.item(i).getNodeValue());
    }
}

my NodeList is always empty. I can't get this thing to work. I expect it to at least get the specified node in my NodeList and then I'll try to remove it entirely from the document. In general, I need to make something that will get any node md:EntityDescriptor with a specified value of entityID and then remove it from the document.


Solution

  • Your XML is using namespaces. Querying such documents with XPath is slightly different - XPath with namespace in Java. Using the knowledge from the linked question, the simplest way to adapt your code would be to edit your XPath like this:

    //*[local-name()='EntityDescriptor'][@entityID='https://id.eht.eu']
    

    Next, you said you wanted to remove (Removing nodes from an XmlDocument) elements that are the result of your search query. You could do so by iterating your nl one Node at a time, refer to its parent, and have it remove the reference to that node. Adapting your own code, this process could look like this:

    for (int i = 0; i < nl.getLength(); i++) {
        Node elem = nl.item(i);
        // Debug output:
        // System.out.println(elem.getTextContent());
        elem.getParentNode().removeChild(elem);
    }
    

    Finally, you probably want to store (How to update XML using XPath and Java) your modified document. You can do something like this:

    Transformer xformer = TransformerFactory.newInstance().newTransformer();
    xformer.transform(new DOMSource(doc), new StreamResult(new File("C:/outputTest/output.xml")));