I do not get why this works:
content = urllib2.urlopen(url)
context = etree.iterparse(content, tag='{my_ns}my_first_tag')
context = iter(context)
#for event, elem in context:
# pass
context = etree.iterparse(content, tag='{my_ns}my_second_tag')
for event, elem in context:
pass
where this doesn't work:
content = urllib2.urlopen(url)
context = etree.iterparse(content, tag='{my_ns}my_first_tag')
context = iter(context)
for event, elem in context:
pass
context = etree.iterparse(content, tag='{my_ns}my_second_tag')
for event, elem in context:
pass
and gives me this error:
XMLSyntaxError: Extra content at the end of the document, line 1, column 1
Can I not parse the same content twice? Strange that it is working when I just comment the loop and not the whole iterparse command.
Am I missing to close something?
Many thanks
urllib2.urlopen
gives you a file-like object that you can use to read the contents of the URL you're querying.
I'm guessing here that etree.iterparse
returns an object that can be iterated but doesn't touch content
at all until then. In that case, the first loop is using context
to iterate over the contents of content
, "consuming" the data as it goes.
When you create the second context
, you're passing the same content
, which is "empty" by then.
Edit: as you ask for ways to reparse... One would be to read out the whole data and then pass it separately to each iterparse
call using StringIO
as the file-like object. Eg.
from StringIO import StringIO
# ...
data = content.read()
context = etree.iterparse(StringIO(data), tag='{my_ns}my_first_tag')
# processing...
context = etree.iterparse(StringIO(data), tag='{my_ns}my_second_tag')
# processing...