Search code examples
pythonxmllxml.objectify

Python: How can I use lxml objectify's iterchildren to get details of siblings which are in a different namespace


This is my xml file.

get_subscribers_result.xml

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
    <SOAP-ENV:Header/>
    <SOAP-ENV:Body>
        <ns3:GetSubscriberResponse xmlns:ns3="http://example.com/123/ss/subscribermgmt/v1_0" xmlns:ns2="http://example.com/123/ss/base/v1_0" xmlns:ns4="http://example.com/123/ss/xyz/v1_0" >
            <ns3:subscriber>
                <ns2:created>2015-10-20T16:02:58.831Z</ns2:created>
                <ns2:createdBy>admin</ns2:createdBy>
                <ns2:lastModified>2015-10-20T16:02:58.824Z</ns2:lastModified>
                <ns2:lastModifiedBy>super</ns2:lastModifiedBy>
                <ns2:subscriberDetail>
                    <ns2:key>address</ns2:key>
                    <ns2:value>1st vivekanandar street</ns2:value>
                </ns2:subscriberDetail>
                <ns2:subscriberDetail>
                    <ns2:key>state</ns2:key>
                    <ns2:value>Abu Dhabi</ns2:value>
                </ns2:subscriberDetail>
            </ns3:subscriber>
        </ns3:GetSubscriberResponse>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

Now, I have seen an example from http://davekuhlman.org/Objectify_files/weather_test.py where iterchildren is used.

The code works if there are no namespaces. This code below would have worked for the xml i have , provided i did not have namespaces in my xml.

    obj2 = lxml.objectify.parse("get_subscribers_result.xml")
    root = obj2.getroot()

    tag = '{http://example.com/123/ss/base/v1_0}subscriberDetail'

    for subscriberDetail in enumerate(root.subscriber.iterchildren(tag=tag)):
                   print subscriberDetail.key
                   print subscriberDetail.value
                   print "*********************************"

If i run this, i get

AttributeError: no such child: {http://schemas.xmlsoap.org/soap/envelope/}subscriber

That's because subscriber doesn't belong to the default namespace and it is correct !

I tried

    for subscriberDetail in enumerate(root.{http://example.com/123/ss/subscribermgmt/v1_0}subscriber.iterchildren(tag=tag)):

Any ideas how to make this work when namespaces are present ?


Solution

  • You can use the "namespace aware" xpath function instead and explicitely specify the namespace:

    from lxml import objectify    
    
    obj2 =  lxml.objectify.parse('get_subscribers_result.xml')
    root = obj2.getroot()
    
    tag = '{http://example.com/123/ss/base/v1_0}subscriberDetail'
    
    for subscriberDetail in (root.xpath('//ns2:subscriberDetail', namespaces={'ns2': 'http://example.com/123/ss/base/v1_0'})):
                   print subscriberDetail.key
                   print subscriberDetail.value               
                   print "*********************************"
    

    if you want to iterate over all the nodes including their children, you can do something like this:

    for details in root.xpath('//SOAP-ENV:Envelope/descendant-or-self::*', namespaces={'SOAP-ENV':'http://schemas.xmlsoap.org/soap/envelope/','ns2': 'http://example.com/123/ss/base/v1_0','ns3':"http://example.com/123/ss/subscribermgmt/v1_0"}):
        for element in details:        
            cleaned_tag = element.tag.replace('{'+element.nsmap[element.prefix]+'}','')      
           if element.text:
               print("%s --> %s" % (element.prefix+':'+cleaned_tag,element.text))