Search code examples
pythonxml-parsingelementtreeminidom

XML Parsing: Element Tree (etree) vs. minidom


I've been using minidom to parse XML for years. Now I've suddenly learned about Element Tree. My question which is better for parsing? That is:

  • Which is faster?
  • Which uses less memory?
  • Do either have any O(n^2) dependencies I should worry about?
  • Is one being depreciated in favor of another?

Why do we have two interfaces?

Thanks.


Solution

  • Python has two interfaces probably because Element Tree was integrated into the standard library a good deal later after minidom came to be. The reason for this was likely its far more "Pythonic" API compared to the W3C-controlled DOM.

    If you're concerned about speed, there's also lxml, which builds an ElementTree-compatible DOM using libxml2 and should be quite fast – they have a benchmark suite comparing themselves to ElementTree's Python and C implementations available.

    If you're concerned about memory use, you shouldn't be using a tree API anyway; PullDOM might be a better choice, but I'm extrapolating from experience using Java's excellent pull parser – there doesn't seem to be much current information on PullDOM.