There are a lot of ways in python provided by the standard installation to process an XML, even more as external packages, see http://wiki.python.org/moin/PythonXml.
For my project I use minidom
, it does what I need, but the error reporting is rather telegraphic, for example:
no element found: line 7, column 0
which is correct but is not very human readable, no hinting to which element might be needed. Because of lack of information, I cannot report the error to an user.
This is just an example but there are more cases where minidom could be more detailed but is not. So I need something detailed, an error detailed enough that I can pass the parsing error back to an user.
The xml file that was used for parsing, which in the code is used as config.xml
is:
<?xml version="1.0" encoding="UTF-8"?>
<widget xmlns="http://www.w3.org/ns/widgets">
<icon src="icon.png"/>
<content src="index.html"/>
<name>sample</name>
I tried to do a survey on all the parsers from the above link to see which one has the most useful error reporting, I stopped at lxml:
import xml.dom.minidom as md
md.parse("config.xml")
#xml.parsers.expat.ExpatError: no element found: line 7, column 0
import elementtree.ElementTree as ET
tree = ET.parse("config.xml")
#xml.parsers.expat.ExpatError: no element found: line 7, column 0
from xml import sax
parser = sax.make_parser()
parser.parse("config.xml")
#xml.sax._exceptions.SAXParseException: config.xml:7:0: no element found
import xml.etree.cElementTree as et
et.parse("config.xml")
#cElementTree.ParseError: no element found: line 7, column 0
import xml.dom.pulldom as pd
doc = pd.parse("config.xml")
for event, node in doc:
print event, node
#xml.sax._exceptions.SAXParseException: <unknown>:7:0: no element found
import lxml.etree
tree = lxml.etree.parse("config.xml")
#lxml.etree.XMLSyntaxError: Premature end of data in tag widget line 2, line 7, column 1
The conclusion is that lxml
library had the best error reporting from the above list:
"Premature end of data in tag widget line 2, line 7, column 1"