After parsing an xml string with an attribute value containing the '
characters, when I try to write it to a file or output it as a string, I am not able to retain '
occurrences. I know, the final xml is still valid, but from the diff perspective, I want to retain the '
values. Any tips?
Here is a sample code,
import xml.etree.ElementTree as ETree
xml_str = '''<?xml version="1.0" encoding="UTF-8"?>
<testItem name="SomeName" description="Change to ''Calculated'' in the Diagram tab">
</testItem>
'''
root = ETree.fromstring(xml_str)
tree = ETree.ElementTree(root)
ETree.tostring(root, encoding='utf-8')
# tree.write('output.xml') # this also doesn't work, writes single quotes to the file
I get the below output,
b'<testItem name="SomeName" description="Change to \'\'Calculated\'\' in the Diagram tab">\n</testItem>'
As a workaround you can use minidom:
from xml.dom.minidom import parseString
xml_str = '''<?xml version="1.0" encoding="UTF-8"?>
<testItem name="SomeName" description="Change to ''Calculated'' in the Diagram tab">
</testItem>
'''
dom = parseString(xml_str)
pretty_xml = dom.toprettyxml(indent=" ", encoding="utf-8").decode("utf-8")
# Ensure ' is retained
pretty_xml = pretty_xml.replace("'", "'")
print(pretty_xml)
Output:
<?xml version="1.0" encoding="utf-8"?>
<testItem name="SomeName" description="Change to ''Calculated'' in the Diagram tab">
</testItem>
Without replace() you will get:
<?xml version="1.0" encoding="utf-8"?>
<testItem name="SomeName" description="Change to ''Calculated'' in the Diagram tab">
</testItem>