Search code examples
pythonsortingattributesminidom

Sort nodes after using getElementsByTagName by the nodes attributes


EDIT

The dictionary is the offender here, the answer marked on this question works, the dictionary does what it wants though. Sorting the dictionary is the answer in this case, but now I know how to sort nodes via attributes and so do you.

END

I am so happy to be asking Python questions, here is what I have:

def parse_fixed_data(self, format):
    return_message = {}
    nodes = format.getElementsByTagName('data')
    for node in nodes:
        return_message[node.attributes['name'].value] = self.raw_message[int(node.attributes['from'].value):int(node.attributes['to'].value)] 
    return return_message

This works almost beautifully. The 'format' variable is an already parsed node, which contains a bunch of 'data' nodes. Here is the xml:

<pmbmsg id='pmb_header'>
    <version maj='01' min='00' rev='0000' type='FIXED' delimeter=''>
        <data seq='1'   from='0'   to='3'    name='message_type'/>
        <data seq='2'   from='3'   to='13'   name='version'/>
        <data seq='3'   from='13'  to='33'   name='from_system'/>
        <data seq='4'   from='33'  to='53'   name='to_system'/>
        <data seq='5'   from='53'  to='73'   name='family'/>
        <data seq='6'   from='73'  to='83'   name='priority'/>
        <data seq='7'   from='83'  to='103'  name='msg_format_id'/>
        <data seq='8'   from='103' to='135'  name='msg_unique_id'/>
        <data seq='9'   from='135' to='161'  name='created'/>
        <data seq='10'  from='161' to='163'  name='hop_count'/>
        <data seq='11'  from='163' to='173'  name='original_msg_format_id'/>
        <data seq='12'  from='173' to='205'  name='original_unique_id'/>
        <data seq='13'  from='205' to='245'  name='padding'/>
        <data seq='14'  from='245' to='4086' name='message_data'/>
    </version>
</pmbmsg>

Well this works all well and good but I get the dictionary elements back in this order:

u'to_system'            
u'padding'          
u'original_msg_format_id'   
u'original_unique_id'       
u'family'           
u'created'          
u'msg_format_id'        
u'hop_count'            
u'msg_unique_id'            
u'priority'         
u'version'          
u'from_system'          
u'message_type'         
u'message_data'

(values removed)

I would like them to come back in the order they appear in the xml, and there seq attribute could help this. After this line in the Python code:

nodes = format.getElementsByTagName('data')

...is there some function I could run on nodes that would sort this? Or is there something I could state when getting the nodes that would let it know to sort them? You would think that it would just naturally get it in the order the xml is written?

If there is no function to do this auto-magically for me, I can handle hacking it.


Solution

  • The nodes are not sorted by the name in the XML, and also this is reflected in the list of nodes. They are going to appear in the same order from which they were iterated. Lists, by definition, are ordered. Dictionaries are not. The problem you're having is that when you're iterating the dictionary keys, your attribute names are out of order and there is no way around this short of sorting the dictionary.

    You can either sort the nodes before processing the dict (which still does not guarantee that the dict itself will be ordered):

    >>> [node.attributes['name'].value for node in sorted(nodes, key=lambda x: x.attributes['name'].value)]
    [u'created', u'family', u'from_system', u'hop_count', 
    u'message_data', u'message_type', u'msg_format_id', u'msg_unique_id', 
    u'original_msg_format_id', u'original_unique_id', u'padding', u'priority', 
    u'to_system', u'version']
    

    Or you can use collections.OrderedDict (available in Python 2.7+) instead of a normal dictionary to create return_message.

    # No example because I don't have acces to Python 2.7
    

    Or you can sort your dictionary by values using sorted().

    >>> import operator
    >>> sorted_return_message = sorted(return_message.iteritems(), key=operator.itemgetter(0))
    >>> for k,v in sorted_return_message: print k
    ... 
    created
    family
    from_system
    hop_count
    message_data
    message_type
    msg_format_id
    msg_unique_id
    original_msg_format_id
    original_unique_id
    padding
    priority
    to_system
    version
    

    Or you can just sort the keys at runtime:

    >>> for k in sorted(return_message):
    ...     print k
    ... 
    created
    family
    from_system
    hop_count
    message_data
    message_type
    msg_format_id
    msg_unique_id
    original_msg_format_id
    original_unique_id
    padding
    priority
    to_system
    version