Search code examples
javaxmldocxmlunit

Compare two xml files ignoring certain elements using XPath in Java


How can I compare two XML files, ignoring certain elements using XPath?

For example, I need to compare the below two XML files, but I need to ignore 'Date' element, by passing the Xpath(//Set[1]/Product[\1]/Date) of this element during the run. The element to ignore could vary each time.

XML file 1:

<?xml version="1.0" encoding="utf-8"?>
<Set
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="urn:abc:product:v3" xsi:schemaLocation="urn:abc:product:v3 abc.xsd">
    <Product>
        <id>1</id>
        <ref>1</ref>
        <Date>2021-09-19</Date>
        <company>JJ</company>
        <lastModified>2021-09-20T21:00:30</lastModified>
        <productOne>
            <partProduct>
                <Level>3.0</Level>
                <Flag>0</Flag>
                <Code>EN</Code>
            </partProduct>
        </productOne>
    </Product>
</Set>

XML file 2:

<?xml version="1.0" encoding="utf-8"?>
<Set
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="urn:abc:product:v3" xsi:schemaLocation="urn:abc:product:v3 abc.xsd">
    <Product>
        <id>2</id>
        <ref>2</ref>
        <Date>2021-09-20</Date>
        <company>JJ</company>
        <lastModified>2021-09-20T21:00:30</lastModified>
        <productOne>
            <partProduct>
                <Level>3.0</Level>
                <Flag>0</Flag>
                <Code>EN</Code>
            </partProduct>
        </productOne>
    </Product>
</Set>

Solution

  • You need to transform both files into a form where they compare equal, by removing the elements you want to ignore. You would typically do this using XSLT. After the transformation you could either compare the results using the XPath 2.0 function deep-equal(), or serialise both documents as canonical XML and compare the files at the character or binary level.

    I would do this by running XQuery Update to delete the nodes selected by the path expression, and then comparing the resulting documents either using fn:deep-equal(), or by doing canonical serialization and comparing the resulting lexical forms.

    As an alternative to XQuery Update you could use xmlstarlet or Saxon's Gizmo tool.

    But it might depend on what you want from the comparison. The above is fine if you want a yes/no answer, but getting details of the differences is more difficult. You could write your own query to find the differences, or use a tool such as DeltaXML.

    NOTE: This answer has been subsequently edited by a third party in a way that makes nonsense of the comment thread. Please ignore the comments.