Search code examples
pythonxmlxmldiff

compare xml files using python


I want to compare these two xml files:

File1.xml:

<ngs_sample id="40332">
  <workflow value="salmonella" version="101_provisional" />
  <results>
  <gastro_prelim_st reason="not novel" success="false">
      <type st="1364" />
      <type st="9999" />
  </gastro_prelim_st>
 </results>
</ngs_sample>

File2.xml:

<ngs_sample id="40332">
  <workflow value="salmonella" version="101_provisional" />
  <results>
  <gastro_prelim_st reason="not novel" success="false">
      <type st="1364" />
   </gastro_prelim_st>
 </results>
</ngs_sample>

I've used xmldiff to compare a.xml with b.xml:

def compare_xmls(observed,expected):

    from xmldiff import main, formatting
    formatter = formatting.DiffFormatter()
    diff = main.diff_files(observed,expected,formatter=formatter)
    return diff

out = compare_xmls(a.xml, b.xml)
print(out)

OUTPUT:

[delete, /ngs_sample/results/gastro_prelim_st/type[2]]

Anyone know how to identify what is the difference between the two xml files, i.e. what has been deleted compared to the file b.xml. Anyone recommend any other way of comparing xml files in python?


Solution

  • You can switch to the XMLFormatter and manually filter out the results:

    ...
    # Change formatter:
    formatter = formatting.XMLFormatter(normalize=formatting.WS_BOTH)
    
    ...
    
    # after `out` has been retrieved:
    import re
    for i in out.splitlines():
      if re.search(r'\bdiff:\w+', i):
        print(i)
    
    # Result:
    #       <type st="9999" diff:delete=""/>