How to get all XPaths from XML with just key names and no template URLs, with Python

I need to extract XPaths and values from XML object. Currently I use lxml which with either gives long paths with repeated template URLS or just indices of XPaths keys without names.

Question: How to get Xpaths with just names, without template URLs. Yes, string cleanup after parsing works, but I hope to find a clean solution using lxml or similar library

  1. with getelementpath(): has template URLs and '\n\t\t' in empty keys.
>> [(root1.getelementpath(e), e.text) for e in root1.iter()][5:10]

 ('{}territory', '\n\t\t'),
  1. with getpath(): has no key names URLs and '\n\t\t' in empty keys.
>> [(root1.getpath(e), e.text) for e in root1.iter()][5:10]

[('/*/*[2]/*[1]/*', 'ISO_639-1'),
 ('/*/*[2]/*[2]', 'xx'),
 ('/*/*[3]', '\n\t\t'),
 ('/*/*[3]/*[1]', '\n\t\t\t'),
 ('/*/*[3]/*[1]/*', 'ISO_3166-1')]
  1. what I need: key names URLs and None in empty keys. I believe I've seen it somewhere, but can't find now...
[('language/terminology_id/value', 'ISO_639-1'),
('territory', None),
('territory/terminology_id', None),
('territory/terminology_id/value', 'ISO_3166-1')]

this is the XML header:

<?xml version="1.0" ?>
<Lab test results
        <value>Lab test results</value>


  • I'd still use .getpath().

    The reason you're getting * in your paths is because your XML has a default namespace. By using * the namespace doesn't need to be taken into account when creating a usable xpath.

    To resolve this, first set the element name (.tag) to the local-name (element name without prefix or uri).

    Also, you can create an XMLParser and set remove_blank_text to True to get rid of the entries that are only whitespace.


    XML Input (test.xml)

            <value>Lab test results</value>


    from lxml import etree
    from pprint import pprint
    parser = etree.XMLParser(remove_blank_text=True)
    tree = etree.parse('test.xml', parser=parser)
    xpaths = []
    for elem in tree.iter():
        elem.tag = etree.QName(elem).localname
        xpaths.append((tree.getpath(elem), elem.text))

    Printed Output

    [('/Lab_test_results', None),
     ('/Lab_test_results/name', None),
     ('/Lab_test_results/name/value', 'Lab test results'),
     ('/Lab_test_results/language', None),
     ('/Lab_test_results/language/terminology_id', None),
     ('/Lab_test_results/language/terminology_id/value', 'ISO_639-1')]

    If you need to also collect attributes, you can make a few small changes...

    for elem in tree.iter():
        elem.tag = etree.QName(elem).localname
        xpath = tree.getpath(elem)
        xpaths.append((xpath, elem.text))
        for attr in elem.attrib:
            xpaths.append((f"{xpath}/@{attr}", elem.get(attr)))