Search code examples
pythonxpathlxml

Using XPath with Python's lxml module, how to validate a node's path?


I'm using XPath with Python's lxml module, and have the following xml code.

<library>
  <section1>
    <book>
      <title>Harry Potter</title>
      <author>J.K. Rowling</author>
    </book>
  </section1>
  <section2>
    <book>
      <title>Sapiens</title>
      <author>Yuval Noah Harari</author>
    </book>
  </section2>
</library>

Now I have some title nodes: titles = root.xpath('//title'). How to validate if a title is a descendant of the child node section1 of the root of the xml? For some reason I have to get the titles first with //title, then do the validation, rather than find them directly by their full xml paths. The validation code should be like this:

titles = root.xpath('//title')
if title.xpath('.[ancestor::/library/section1]'):
    do_sth()

The above code results in a XPathEvalError: Invalid expression.


Solution

  • As for your attempt to write a relative XPath, in XPath 1.0 (which lxml supports), you can use e.g.

    from lxml import etree as ET
    
    root = ET.parse('librarySample1.xml')
    
    titles = root.xpath('//title')
    
    for title in titles:
        if title.xpath('self::node()[ancestor::section1/parent::library]'):
            print(title)
    

    Or switch to the current version of XPath, XPath 3.1 by using saxonche where you can use e.g.

    from saxonche import PySaxonProcessor
    
    with PySaxonProcessor() as saxon_proc:
        xpath_proc = saxon_proc.new_xpath_processor()
    
        xpath_proc.set_context(xdm_item=saxon_proc.parse_xml(xml_file_name='librarySample1.xml'))
    
        titles = xpath_proc.evaluate('//title')
    
        for title in titles:
            xpath_proc.set_context(xdm_item=title)
            if xpath_proc.evaluate_single('.[ancestor::section1/parent::library]'):
                print(title)