Search code examples
pythonxmlxml-parsingpython-2.7minidom

Which XML parser has the most human readable error reporting?


There are a lot of ways in python provided by the standard installation to process an XML, even more as external packages, see http://wiki.python.org/moin/PythonXml.

For my project I use minidom, it does what I need, but the error reporting is rather telegraphic, for example:

no element found: line 7, column 0

which is correct but is not very human readable, no hinting to which element might be needed. Because of lack of information, I cannot report the error to an user.

This is just an example but there are more cases where minidom could be more detailed but is not. So I need something detailed, an error detailed enough that I can pass the parsing error back to an user.

Which of the standard XML „processing solutions” has the most detailed error reporting, if none which of the external packages for XML support has that?

The xml file that was used for parsing, which in the code is used as config.xml is:

<?xml version="1.0" encoding="UTF-8"?>
<widget xmlns="http://www.w3.org/ns/widgets">
    <icon src="icon.png"/>
    <content src="index.html"/>
<name>sample</name>

Solution

  • I tried to do a survey on all the parsers from the above link to see which one has the most useful error reporting, I stopped at lxml:

    import xml.dom.minidom as  md
    md.parse("config.xml")
    #xml.parsers.expat.ExpatError: no element found: line 7, column 0
    
    
    import elementtree.ElementTree as ET
    tree = ET.parse("config.xml")
    #xml.parsers.expat.ExpatError: no element found: line 7, column 0
    
    
    from xml import sax
    parser = sax.make_parser()
    parser.parse("config.xml")
    #xml.sax._exceptions.SAXParseException: config.xml:7:0: no element found
    
    
    import xml.etree.cElementTree as et
    et.parse("config.xml")
    #cElementTree.ParseError: no element found: line 7, column 0
    
    import xml.dom.pulldom as pd
    doc = pd.parse("config.xml")
    for event, node in doc:
         print event, node
    
    #xml.sax._exceptions.SAXParseException: <unknown>:7:0: no element found
    
    import lxml.etree
    tree = lxml.etree.parse("config.xml")
    
    #lxml.etree.XMLSyntaxError: Premature end of data in tag widget line 2, line 7, column 1
    

    The conclusion is that lxml library had the best error reporting from the above list:

    "Premature end of data in tag widget line 2, line 7, column 1"