Search code examples
pythonxmllxmllxml.objectify

Objectify xml string with dashes in tags and attributes names


I am using lxml to objectify xml string with dashes in the tags.

For example:

from lxml import objectify
xml_string = """<root>
                   <foo-foo name="example" foo-description="description">
                       <bar doc-name="name" />
                       <test tag="test" />
                    </foo-foo>
                </root>"""
obj = objectify.fromstring(xml_string)

After this step, the elements' names come with dashes. I can't access foo-foo due to dashes in the name.

How can I remove dashes from tags name as well as from attribute names?


Solution

  • It's hacky, but you could do something like this to transform the - in element names to a _:

    from lxml import etree
    from lxml import objectify
    
    xml_string = """<root>
                       <foo-foo name="example" foo-description="description">
                           <bar doc-name="name" />
                           <test tag="test" />
                        </foo-foo>
                    </root>"""
    
    doc = etree.fromstring(xml_string)
    for tag in doc.iter():
        if '-' in tag.tag:
            tag.tag = tag.tag.replace('-', '_')
    
    obj = objectify.fromstring(etree.tostring(doc))
    

    In particular, I think there is probably a better way to go from the parsed XML document in doc to the objectified version without dumping and reparsing the XML, but this is the best I could come up with on short notice.