I have the XML file
<?xml version="1.0" encoding="UTF-8"?>
<document>
<title>
<tag>A</tag>
X
</title>
</document>
and I'd like to read it with Python such that I could reconstruct the file exactly. For example, I need to retain A
as a <tag>
, and X
as text.
The default XML implementation seems to have problems with the combination of <tag>
and text
in the <title>
element. itertext()
doesn't retain A
as a <tag>
, and the regular iteration doesn't capture X
at all:
import xml.etree.ElementTree as ET
tree = ET.parse("a.xml")
r = tree.getroot()
title = r[0]
print(list(title.itertext()))
print([c for c in title])
print(repr(title.text))
['\n ', 'A', '\n X\n ']
[<Element 'tag' at 0x7fc11c130c20>]
'\n '
Any hints?
ElementTree stores A as text
attribute of the "tag"-element and X as tail
attribute of the same element.
import xml.etree.ElementTree as ET
xml = ET.fromstring('<title><tag>A</tag>X</title>')
tag = xml.find("tag")
print(tag.text) # prints A
print(tag.tail) # prints X