Search code examples
iterparse

Iterparse returns empty iterable when parsing xml with a default namespace


I'm parsing an xml document using iterparse.

from lxml import etree
import tempfile

content = """<root xmlns="blah.com">
   <foo>
      <attribute id="3" />
   </foo>
   <foo>
      <structure>
         <baz>
            <x>g</x>
         </baz>
      </structure>
   </foo>
</root>"""

src_file = tempfile.NamedTemporaryFile()
src_file.write(content)
src_file.flush()

context = etree.iterparse(
        src_file.name,
        events=("end", ),
        tag="foo",
    )

for event, element in context:
    print event
    print element
  • Expected result: I see a few end events
  • Actual result: nothing happens

A few things I tried:

  • If I remove the namespace from the xml, it works fine.
  • If I use a namespace with a prefix like xlmns:t="blah.com" it also works fine.
  • Removing the tag="foo" also makes it work fine.

However I would like to use both a base tag, and a default namespace. Is this a bug with iterparse? Am I doing something else wrong?

Edit: edited the code to make it copy-pasteable without ident errors.


Solution

  • Ah the problems with parsers! Your tag must also reflect the complete path. Use your namespace in the tag like so: tag="{blah.com}foo".