I want to retrieve a legacy xml file, manipulate and save it.
Here is my code:
from xml.etree import cElementTree as ET
NS = "{http://www.somedomain.com/XI/Traffic/10}"
def fix_xml(filename):
f = ET.parse(filename)
root = f.getroot()
eventlist = root.findall("%(ns)Event" % {'ns':NS })
xpath = "%(ns)sEventDetail/%(ns)sEventDescription" % {'ns':NS }
for event in eventlist:
desc = event.find(xpath)
desc.text = desc.text.upper() # do some editting to the text.
ET.ElementTree(root, nsmap=NS).write("out.xml", encoding="utf-8")
shorten_xml("test.xml")
The file I load contains:
xmlns="http://www.somedomain.com/XI/Traffic/10"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.somedomain.com/XI/Traffic/10 10.xds"
at the root tag.
I have the following problems, related to namespace:
<?xml version="1.0" encoding="utf-8"?>
at the begining.<ns0:eventDescription>
while I need output as the original <eventDescription>
, without namespace at the begining.How can these be solved?
Have a look at the lxml tutorial section on namespaces. Also this article about namespaces in ElementTree.
Problem 1: Put up with it, like everybody else does. Instead of "%(ns)Event" % {'ns':NS }
try NS+"Event"
.
Problem 2: By default, the XML declaration is written only if it is required. You can force it (lxml only) by using xml_declaration=True
in your write()
call.
Problem 3: The nsmap
arg appears to be lxml-only. AFAICT it needs a MAPping, not a string. Try nsmap={None: NS}
. The effbot article has a section describing a workaround for this.