I am working on a xml parser. The goal is to parse a number of different xml files where prefixes and tags remain consistent but namespaces change.
I am hence trying either:
<prefix:tags>
without resolving (replacing) the prefix with the namespace. The prefixes remain unchanged from document to document.<prefix:tag>
) could be replaced with the proper namespace. I have tried with xml.etree.ElementTree
.
I also had a look at lxml
I did not find any configuration option of the XMLParser in lxml that could help me out although here I could read an answer where the author suggests that lxml
should be able to collect namespaces for me automatically.
Interestingly, parsed_file = etree.XML(file)
fails with the error:
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1
One example of the files I would like to parse is here
items = tree.xpath("*[local-name(.) = 'a_tag_goes_here']")
did the job. On top of that I had to browse the generated list items
manually to define my other desired filtering functions.