Search code examples
pythonminidom

How to get whole text of an Element in xml.minidom?


I want to get the whole text of an Element to parse some xhtml:

<div id='asd'>
  <pre>skdsk</pre>
</div>

begin E = div element on the above example, I want to get

<pre>skdsk</pre>

How?


Solution

  • Strictly speaking:

    from xml.dom.minidom import parse, parseString
    tree = parseString("<div id='asd'><pre>skdsk</pre></div>")
    root = tree.firstChild
    node = root.childNodes[0]
    print node.toxml()
    

    In practice, though, I'd recommend looking at the http://www.crummy.com/software/BeautifulSoup/ library. Finding the right childNode in an xhtml document, and skipping "whitespace nodes" is a pain. BeautifulSoup is a robust html/xhtml parser with fantastic tree-search capacilities.

    Edit: The example above compresses the HTML into one string. If you use the HTML as in the question, the line breaks and so-forth will generate "whitespace" nodes, so the node you want won't be at childNodes[0].