Search code examples
pythonxml

Not able to retain ' in the xml ElementTree write in python


After parsing an xml string with an attribute value containing the ' characters, when I try to write it to a file or output it as a string, I am not able to retain ' occurrences. I know, the final xml is still valid, but from the diff perspective, I want to retain the ' values. Any tips?

Here is a sample code,

import xml.etree.ElementTree as ETree

xml_str = '''<?xml version="1.0" encoding="UTF-8"?>
<testItem name="SomeName" description="Change to &apos;&apos;Calculated&apos;&apos; in the Diagram tab">
</testItem>
'''

root = ETree.fromstring(xml_str)
tree = ETree.ElementTree(root)
ETree.tostring(root, encoding='utf-8')
# tree.write('output.xml') # this also doesn't work, writes single quotes to the file

I get the below output,

b'<testItem name="SomeName" description="Change to \'\'Calculated\'\' in the Diagram tab">\n</testItem>'


Solution

  • As a workaround you can use minidom:

    from xml.dom.minidom import parseString
    
    xml_str = '''<?xml version="1.0" encoding="UTF-8"?>
    <testItem name="SomeName" description="Change to &apos;&apos;Calculated&apos;&apos; in the Diagram tab">
    </testItem>
    '''
    
    dom = parseString(xml_str)
    pretty_xml = dom.toprettyxml(indent="  ", encoding="utf-8").decode("utf-8")
    
    # Ensure &apos; is retained
    pretty_xml = pretty_xml.replace("'", "&apos;")
    
    print(pretty_xml)
    

    Output:

    <?xml version="1.0" encoding="utf-8"?>
    <testItem name="SomeName" description="Change to &apos;&apos;Calculated&apos;&apos; in the Diagram tab">
    </testItem>
    

    Without replace() you will get:

    <?xml version="1.0" encoding="utf-8"?>
    <testItem name="SomeName" description="Change to ''Calculated'' in the Diagram tab">
    </testItem>