Search code examples
pythonxmllxmllxml.objectify

Getting valueless elements in python lxml


I've been trying to use the lxml package's "objectify" to parse my XMLs and I've come across a problem. If I have a valueless tag, I can't seem to find a way to get its attributes.

For instance:

import lxml.objectify

xml_obj = lxml.objectify.fromstring("""
<A>
    <B foo="baz"/>
    <B foo="bar"/>
</A>""")
print xml_obj.getchildren()

A = None [ObjectifiedElement]
    B = u'' [StringElement]
      * baz = 'boo'
    B = u'' [StringElement]
      * foo = 'bar'

As you can see, the two B tags are turned into StringElement, but as seen when dumping the object, there should still be a way to retrieve the attributes!


Solution

  • import lxml.objectify as objectify
    import lxml.etree as ET
    
    content = """
    <A>
        <B foo="baz"/>
        <B foo="bar"/>
    </A>"""
    xml_obj = objectify.fromstring(content)
    print(xml_obj.getchildren())
    # [u'', u'']
    

    You can access the element's attributes using elt.attrib:

    for child in xml_obj.getchildren():
        print(child.attrib)
    # {'foo': 'baz'}
    # {'foo': 'bar'}
    

    You can modify those attributes as well:

    xml_obj.B.attrib['baz'] = 'boo'
    xml_obj.B[1].attrib['foo'] = 'blah'
    

    Serializing xml_obj with ET.tostring shows the result:

    print(ET.tostring(xml_obj, pretty_print=True))
    # <A>
    #   <B foo="baz" baz="boo"/>
    #   <B foo="blah"/>
    # </A>