I am currently trying to apply logic to Element values in a XML file. Specifically I am trying to encode all the values to UTF-8 while not touching any of the element names/attributes themselves.
Here is the sample XML:
<?xml version="1.0"?>
<sd_1>
<sd_2>
<sd_3>\311 is a fancy kind of E</sd_3>
</sd_2>
</sd_1>
Currently I have tried 3 methods to achieve this with no success:
First I tried the looping through each element retrieving the values with .text and using .parse:
import xml.etree.ElementTree as ET
et = ET.parse('xml/test.xml')
for child in et.getroot():
for core in child:
core_value = str(core.text)
core.text = core_value.encode('utf-8')
et.write('output.xml')
This results in an XML file that does not have the text \311 altered correctly, it just stays as it is.
Next I tried the .iterparse with cElementTree to no avail:
import xml.etree.cElementTree as etree
xml_file_path = 'xml/test.xml'
with open(xml_file_path) as xml_file:
tree = etree.iterparse(xml_file)
for items in tree:
for item in items:
print item.text
etree.write('output1.xml')
This results in:
"...print item.text\n', "AttributeError: 'str' object has no attribute 'text'..."
Not sure what I am doing wrong there, I have seen multiple examples with the same arrangement, but when I print through the elements without the .text I see the tuple with a string value of 'end' at the start and I think that is causing the issue with this method.
How do I properly iterate through my elements, and without specifying the element names e.g. .findall(), apply logic to the values housed in each Element so that when I write the xml to file it saves the changes made when the program was iterating through element values?
Is this what you are looking for?
import xml.etree.ElementTree as ET
et = ET.parse('xml/test.xml')
for child in et.getroot():
for core in child:
core_value = str(core.text)
core.text = core_value.decode('unicode-escape')
et.write('output.xml')