I have a huge and complex xml document that I wanted to parse into a dictionary (and later on into a db with sqlalchemy). I want to use xmltodict for this task.
However, it seems that xmltodict cannot parse deeply nested xml directly.
My MWE:
test.xml
<?xml version="1.0" encoding="UTF-8"?>
<ns1:tag-1>
<ns2:tag-2 attrib1="value" attrib2="value">
<ns3:tag-3 attrib3="value">blabla</ns3:tag-3>
</ns2:tag-2>
</ns1:tag-1>
test.py
import xmltodict as x2d
with open('ESCIDOC_test.xml', encoding='utf-8') as purein:
doc = x2d.parse(purein.read())
print(doc['ns1:tag-1']['ns2:tag-2']['@attrib2']) # works
print(doc['ns1:tag-1']['ns2:tag-2']['ns3:tag-3']['#text'] # does not work, TypeError
ns3tree = doc['ns1:tag-1']['ns2:tag-2']['ns3:tag-3']
print(ns3tree['#text']) # works
Why do I need to assign it to a new variable first, to make it work? The whole xml is parsed anyway, isn't it?
print(doc)
# OrderedDict([('ns1:tag-1', OrderedDict([('ns2:tag-2', OrderedDict([('@attrib1', 'value'), ('@attrib2', 'value'), ('ns3:tag-3', OrderedDict([('@attrib3', 'value'), ('#text', 'blabla')]))]))]))])
Is this intended because of possible memory issues? Is there a more elegant workaround?
You left out the closing ) on the line that does not work.
I used python 3.5, copied your files but added the closing ) and changed ESCIDOC_test.xml to test.xml. Ran it and all 3 print statements worked correctly (no TypeError).