Search code examples
pythonxmlcanonicalizationcanonical-form

Normalize in canonical form two XML files in Python


I have two XML files and I need to check that they contains the same exact information.

Regardless of tag or attribute order.

For instance this two XML files should be equals:

test1.xml

<blocklist lastupdate="1459262434336" xmlns="http://www.mozilla.org/2006/addons-blocklist">
  <emItems>
    <emItem blockID="i454" id="[email protected]">
      <versionRange minVersion="0" maxVersion="*" severity="3">
        <targetApplication id="{ec8030f7-c20a-464f-9b0e-13a3a9e97384}">
          <versionRange maxVersion="3.6.*" minVersion="3.6"/>
        </targetApplication>
      </versionRange>
      <versionRange maxVersion="*" minVersion="0"/>
      <prefs>
        <pref>test.blocklist</pref>
      </prefs>
    </emItem>
  </emItems>
</blocklist>

test2.xml

<blocklist lastupdate="1459262434336" xmlns="http://www.mozilla.org/2006/addons-blocklist">
  <emItems>
    <emItem blockID="i454" id="[email protected]">
      <prefs>
        <pref>test.blocklist</pref>
      </prefs>
      <versionRange minVersion="0" maxVersion="*" severity="3">
        <targetApplication id="{ec8030f7-c20a-464f-9b0e-13a3a9e97384}">
          <versionRange maxVersion="3.6.*" minVersion="3.6"/>
        </targetApplication>
      </versionRange>
      <versionRange minVersion="0" maxVersion="*"/>
    </emItem>
  </emItems>
</blocklist>

I tried to find some solutions like:

I am also going to try this solution

But do you have any idea of what would be my options here? Isn't XML Normalization and Canonicalization supposed to handle this for me?

what am I doing wrong here?

If I were to do it in JSON I would use: json.dumps(data, sort_keys=True, separators=(',', ':'))


Solution

  • For those who are interested on this subject, I created a xml-verifier script that does that by converting the xml into a JSON file and then exporting both file as Canonical JSON and doing a diff of them.

    https://github.com/mozilla-services/amo2kinto/blob/1.7.2//amo2kinto/verifier.py#L80-L108