Search code examples
pythonparsingdomminidom

Parsing document with python minidom


I have the following XML document that I have to parse using python's minidom:

<?xml version="1.0" encoding="UTF-8"?>

<root>
    <bash-function activated="True">
        <name>lsal</name>
        <description>List directory content (-al)</description>
        <code>ls -al</code>
    </bash-function>

    <bash-function activated="True">
        <name>lsl</name>
        <description>List directory content (-l)</description>
        <code>ls -l</code>
    </bash-function>
</root>

Here is the code (the essential part) where I am trying to parse:

from modules import BashFunction
from xml.dom.minidom import parse

class FuncDoc(object):
    def __init__(self, xml_file):
        self.active_func = []
        self.inactive_func = []
        try:
            self.dom = parse(xml_file)
        except Exception as inst:
            print type(inst)
            print inst.args
            print inst

Unfortunately I am encountering some errors. Here is the stacktrace:

<class 'xml.parsers.expat.ExpatError'>
('no element found: line 1, column 0',)
no element found: line 1, column 0

As a python beginner, can you please point me to the root of the problem.


Solution

  • I imagine you are passing in a file handle, in the following way:

    >>> from xml.dom.minidom import parse
    >>> xmldoc = open("xmltestfile.xml", "rU")
    >>> x = FuncDoc(xmldoc)
    

    I'm getting the same error as you do if I try to parse the same document twice without closing it in-between. Try this -- the error appears after the second parse attempt:

    >>> xmldoc.close()
    >>> xmldoc = open("xmltestfile.xml", "rU")
    >>> xml1 = parse(xmldoc)
    >>> xml2 = parse(xmldoc)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/minidom.py", line 1918, in parse
        return expatbuilder.parse(file)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/expatbuilder.py", line 928, in parse
        result = builder.parseFile(file)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/expatbuilder.py", line 211, in parseFile
        parser.Parse("", True)
    xml.parsers.expat.ExpatError: no element found: line 1, column 0
    

    After parsing for the first time, the entire file has been read. The new parsing attempt then receives 0 data. My guess would be that the fact that the document is parsed twice is a bug in your code. If, however, that's what you want to do, you can reset it with xmldoc.seek(0).