I have a network application (using Twisted) that receives chunks of xml (as in the entire xml may not come in its entirety in a single packet) over the internet. My thought process is to slowly build the xml message as it's received. I've "settled" on iterparse from xml.etree.ElementTree. I've been dabbling in some code and the following (non-Twisted code) works fine:
import xml.etree.ElementTree as etree
from io import StringIO
buff = StringIO(unicode('<notorious><burger/></notorious>'))
for event, elem in etree.iterparse(buff, events=('end',)):
if elem.tag == 'notorious':
print(etree.tostring(elem))
Then I built the following code to simulate how data may be received on my end:
import xml.etree.ElementTree as etree
from io import StringIO
chunks = ['<notorious>','<burger/>','</notorious>']
buff = StringIO()
for ch in chunks:
buff.write(unicode(ch))
if buff.getvalue() == '<notorious><burger/></notorious>':
print("it should work now")
try:
for event, elem in etree.iterparse(buff, events=('end',)):
if elem.tag == 'notorious':
print(etree.tostring(elem))
except Exception as e:
print(e)
But the code spits out:
'no element found: line 1, column 0'
I can't wrap my head around it. Why does that error occur when the stringIO from the 2nd sample has the same contents of the stringIO in the first code sample?
ps:
Thanks
File objects and file-like objects have a file position. Once it's read / written, the file position advance. You need to change the file position (using <file_object>.seek(..)
) before pass the file object to etree.iterparse
so that it can read from the beginning of the file.
...
buff.seek(0) # <-----
for event, elem in etree.iterparse(buff, events=('end',)):
if elem.tag == 'notorious':
print(etree.tostring(elem))