Search code examples

Access the processing-instructions before/after a root element with lxml

Using lxml, how can I access/iterate the processing-instructions located before the root open tag or after the root close tag?

I have try this, but, according to the documentation, it only iterates inside the root element:

import io

from lxml import etree

content = """\

source = etree.parse(io.StringIO(content))

print(etree.tostring(source, encoding="unicode"))
# -> <?before1?><?before2?><root>text</root><?after1?><?after2?>

for node in source.iter():
# -> <class 'lxml.etree._Element'>

My only solution is to wrap the XML with a dummy element:

dummy_content = "<dummy>{}</dummy>".format(etree.tostring(source, encoding="unicode"))
dummy = etree.parse((io.StringIO(dummy_content)))

for node in dummy.iter():
# -> <class 'lxml.etree._Element'>
#    <class 'lxml.etree._ProcessingInstruction'>
#    <class 'lxml.etree._ProcessingInstruction'>
#    <class 'lxml.etree._Element'>
#    <class 'lxml.etree._ProcessingInstruction'>
#    <class 'lxml.etree._ProcessingInstruction'>

Is there a better solution?


  • You can use the getprevious() and getnext() methods on the root element.

    before2 = source.getroot().getprevious()
    before1 = before2.getprevious()
    after1 = source.getroot().getnext()
    after2 = after1.getnext()


    Using XPath (on the ElementTree or Element instance) is also possible:

    before = source.xpath("preceding-sibling::node()")  # List of two PIs
    after = source.xpath("following-sibling::node()")