Search code examples
pythonelementtreemonkeypatchingcelementtree

Is there a better way to give elements knowlege of their parents and xpath in xml.etree.ElementTree


I have the following code which works:

import xml.etree.ElementTree as etree


def get_path(self):
    parent =  ''
    path = self.tag
    sibs = self.parent.findall(self.tag)
    if len(sibs) > 1:
        path = path + '[%s]'%(sibs.index(self)+1)
    current_node = self
    while True:
        parent = current_node.parent
        if not parent:
            break
        ptag = parent.tag
        path = ptag + '/' + path
        current_node = parent
    return path

etree._Element.get_path = get_path
etree._Element.parent = None

class XmlDoc(object):
    def __init__(self):
        self.root = etree.Element('root')
        self.doc = etree.ElementTree(self.root)

    def SubElement(self, parent, tag):
        new_node = etree.SubElement(parent, tag)
        new_node.parent = parent
        return new_node

doc = XmlDoc()
a1 = doc.SubElement(doc.root, 'a')
a2 = doc.SubElement(doc.root, 'a')
b = doc.SubElement(a2, 'b')
print etree.tostring(doc.root), '\n'
print 'element:'.ljust(15), a1
print 'path:'.ljust(15), a1.get_path()
print 'parent:'.ljust(15), a1.parent, '\n'
print 'element:'.ljust(15), a2
print 'path:'.ljust(15), a2.get_path()
print 'parent:'.ljust(15), a2.parent, '\n'
print 'element:'.ljust(15), b
print 'path:'.ljust(15), b.get_path()
print 'parent:'.ljust(15), b.parent

Which results in this output:

<root><a /><a><b /></a></root> 

element:        <Element a at 87e3d6c>
path:           root/a[1]
parent:         <Element root at 87e3cec> 

element:        <Element a at 87e3fac>
path:           root/a[2]
parent:         <Element root at 87e3cec> 

element:        <Element b at 87e758c>
path:           root/a/b
parent:         <Element a at 87e3fac>

Now this is drastically changed from the original code, but I'm not allowed to share that.

The functions aren't too inefficient but there is a dramatic performance decrease when switching from cElementTree to ElementTree which I expected, but from my experiments it seems like monkey patching cElementTree is impossible so I had to switch.

What I need to know is whether there is either a way to add a method to cElementTree or if there is a more efficient way of doing this so I can gain some of my performance back.

Just to let you know I am thinking of as a last resort implementing selected static typing and to compile with cython, but for certain reasons I really don't want to do that.

Thanks for taking a look.

EDIT: Sorry for the wrong use of the term late binding. Sometimes my vocabulary leaves something to be desired. What I meant was "monkey patching."

EDIT: @Corley Brigman, Guy: Thank you very much for your answers which do address the question, however (and I should have stated this in the original post) I had completed this project before using lxml which is a wonderful library that made coding a breeze but due to new requirements (This needs to be implemented as an addon to a product called Splunk) which ties me to the python 2.7 interpreter shipped with Splunk and eliminates the possibility of adding third party libraries with the exception of django.


Solution

  • If you need parents, use lxml instead - it tracks parents internally, and is still C behind the scenes so it's very fast.

    However... be aware that there is a tradeoff in tracking parents, in that a given node can only have a single parent. This isn't usually a problem, however, if you do something like the following, you will get different results in cElementTree vs. lxml:

    p = Element('x')
    q = Element('y')
    r = SubElement(p, 'z')
    q.append(r)
    

    cElementTree:

    dump(p)
    <x><z /></x>
    dump(q)
    <y><z /></y>
    

    lxml:

    dump(p)
    <x/>
    dump(q)
    <y>
      <z/>
    </y>
    

    Since parents are tracked, a node can only have one parent, obviously. As you can see, the element r is copied to both trees in cElementTree, and reparented/moved in lxml.

    There are probably only a small number of use cases where this matters, but something to keep in mind.