Search code examples
pythonxml

Read Python XML with tag and text in one element


I have the XML file

<?xml version="1.0" encoding="UTF-8"?>
<document>
  <title>
    <tag>A</tag>
    X
  </title>
</document>

and I'd like to read it with Python such that I could reconstruct the file exactly. For example, I need to retain A as a <tag>, and X as text.

The default XML implementation seems to have problems with the combination of <tag> and text in the <title> element. itertext() doesn't retain A as a <tag>, and the regular iteration doesn't capture X at all:

import xml.etree.ElementTree as ET

tree = ET.parse("a.xml")
r = tree.getroot()

title = r[0]

print(list(title.itertext()))
print([c for c in title])
print(repr(title.text))
['\n    ', 'A', '\n    X\n  ']
[<Element 'tag' at 0x7fc11c130c20>]
'\n    '

Any hints?


Solution

  • ElementTree stores A as text attribute of the "tag"-element and X as tail attribute of the same element.

    import xml.etree.ElementTree as ET
    xml = ET.fromstring('<title><tag>A</tag>X</title>')
    tag = xml.find("tag")
    print(tag.text) # prints A
    print(tag.tail) # prints X