Search code examples
javaxmlhashsha256

Generate SHA-256 hash for a XML in Java


I'm trying to add a caching feature for http requests (for one of my projects) and I thought of using Etag as the hash value. But if the Etag is not there i thought of using the payload to generate a unique hash value. As we all know same xml pay loads might have different structures. For example Sample A and Sample B are same. But their string structures are not the same. What I need is a way to generate the same hash key from both xml samples.

Sample A

<note>
   <to>Tove</to>
   <from>Jani</from>
   <heading>Reminder</heading>
   <body>Don't forget me this weekend!</body>
</note>

Sample B

<note>
   <to>Tove</to>
   <heading>Reminder</heading>
   <from>Jani</from>
   <body>Don't forget me this weekend!</body>
</note>

Solution

  • org.w3c.dom.document.normalizeDocument() document does not alter the order of child elements.

    You could do this by a recursive parse of the document. However, consider whether this is more expensive than the operation you're trying to cache in the first place...

    Method

    • At each level copy all the nodes to a java.util.List implementation, i.e. ArrayList. This is required because org.w3c.dom.NodeList does not allow modification
    • Sort the list using Collections.sort()
    • Remove children from their parent
    • Add children back in sorted order

    Note this does not deal with multiple elements of same name with different contents, but does solve your example

    For example:

    public static void main(String[] args) throws Exception {
        Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File("test.xml"));
        sort(doc);
    
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer transformer = tf.newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        StringWriter writer = new StringWriter();
        transformer.transform(new DOMSource(doc), new StreamResult(writer));
    
        System.out.println(writer);
    }
    
    private static void sort(Node doc) {
        List<Node> children = new ArrayList<>();
        for (int i = 0; i < doc.getChildNodes().getLength(); i++) {
            children.add(doc.getChildNodes().item(i));
        }
        for (Node child : children) {
            doc.removeChild(child);
        }
        Collections.sort(children, (a, b) -> {
            return a.getNodeName().compareTo(b.getNodeName());
        });
        for (Node child : children) {
            doc.appendChild(child);
        }
        for (Node child : children) {
            sort(child);
        }
    }