Search code examples
pythonxmlpython-3.xdomminidom

Order of elements from minidom getElementsByTagName


Is the order for returned elements from Mindom getElementsByTagName the same as it is in document for elements in the same hierarchy / level?

    images = svg_doc.getElementsByTagName('image') 
    image_siblings = []
    for img in images:
        if img.parentNode.getAttribute('layertype') == 'transfer':
            if img.nextSibling is not None:
                if img.nextSibling.nodeName == 'image':
                    image_siblings.append(img.nextSibling)
                elif img.nextSibling.nextSibling is not None and img.nextSibling.nextSibling.nodeName == 'image':
                    image_siblings.append(img.nextSibling.nextSibling)

I need to know if image_siblings will contain the images in the same order, they are placed in the document for the same hierarchy.

I found a similar question for JavaScript, but I'm unsure if this is also true for Python (version 3.5.2) Minidom getElementsByTagName.


Solution

  • According to the code (in Python 2.7), the getElementsByName method relays on the _get_elements_by_tagName_helper function, which code is:

    def _get_elements_by_tagName_helper(parent, name, rc):
        for node in parent.childNodes:
            if node.nodeType == Node.ELEMENT_NODE and \
                (name == "*" or node.tagName == name):
                rc.append(node)
            _get_elements_by_tagName_helper(node, name, rc)
        return rc
    

    What this means is that the order in the getElementByName is the same that you have in the childNodes.

    But this is true only if the tagName appears only in the same level. Notice the recursive call of _get_elements_by_tagName_helper inside the same function, which means that elements with the same tagName that are placed deeper in the tree will be interleaved with the ones you have in a higher level.

    If by document you mean an XML text file or a string, the question is then moved to whether or not the parser respects the order when creating the elements in the DOM. If you use the parse function from the xml.dom.minidom, it relays on the pyexpat library, that in turns use the expat C library.

    So, the short answer would be:

    If you have the tagName only present in the same level of hierarchy in the XML DOM, then the order is respected. If you have the same tagName in other nodes deeper in the tree, those elements will be interleaved with the ones of higher level. The respected order is the order of the elements in the minidom document object, which order depends on the parser.

    Look this example:

    >>> import StringIO
    >>> from xml.dom.minidom import parseString
    >>> s = '''<head>
    ...   <tagName myatt="1"/>
    ...   <tagName myatt="2"/>
    ...   <tagName myatt="3"/>
    ...   <otherTag>
    ...     <otherDeeperTag>
    ...       <tagName myatt="3.1"/>
    ...       <tagName myatt="3.2"/>
    ...       <tagName myatt="3.3"/>
    ...     </otherDeeperTag>
    ...   </otherTag> 
    ...   <tagName myatt="4"/>
    ...   <tagName myatt="5"/>
    ... </head>'''
    >>> doc = parseString(s)
    >>> for e in doc.getElementsByTagName('tagName'):
    ...     print e.getAttribute('myatt')
    ... 
    1
    2
    3
    3.1
    3.2
    3.3
    4
    5
    

    It seems the parser respects the ordering structure of the xml string (most parsers respect that order because it is easier to respect it) but I couldn't find any documentation that confirms it. I mean, it could be the (strange) case that the parser, depending on the size of the document, moves from using a list to a hash table to store the elements, and that could break the order. Take into account that the XML standard does not specify order of the elements, so a parser that does not respect order would be complaint too.