I'm using cElementTree
to extract xml tags and values in a loop and then storing them into a dictionary.
XML file contains:
<root>
<tag1>['item1', 'item2']</tag1>
<tag2>a normal string</tag2>
</root>
Python code (roughly):
import xml.etree.cElementTree as xml
xmldata = {}
xmlfile = xml.parse(XMLFile.xml)
for xmltag in xmlfile.iter():
xmldata[xmltag.tag] = xmltag.text
The problem I have encountered is that the xml file contains different data types, which include string
and list
. Unfortunately Element.text
saves all the xml values as string
(including the lists).
So when I load from the XML file I have:
{'tag1':"['item1', 'item2']", 'tag2':'a normal string'}
When I'd prefer to have:
{'tag1':['item1', 'item2'], 'tag2':'a normal string'}
Is there an easy way to do this?
e.g a command that saves to the dictionary in the original format
Or do I need to set up if statements to determine the value type and save it seperately using an alternative to Element.text
?
You can use literal_eval to try to parse complex python literals. Since your strigns are unquoted, they will raise a SyntaxError in lteral eval, but that is simle to work around:
import xml.etree.cElementTree as xml
from ast import literal_eval
xmldata = {}
xmlfile = xml.parse(XMLFile.xml)
for xmltag in xmlfile.iter():
try:
xmldata[xmltag.tag] = literal_eval(xmltag.text)
except SyntaxError:
xmldata[xmltag.tag] = xmltag.text
Unlike Python's builtin "eval", ast.literal_eval does not allow the execution of expressions, and thus is safe, even if the XML data come from an untrusted source.