Search code examples
pythonxmllxml

lxml xpass can't find a tag below first one in xml


I have an xml doc that looks something like this

<MyXmlRoot>
<App xmlns='urn:SomethingSomething1'>
    ...
</App>
<User xmlns='urn:SomethingSomething2'>
    ...
</User>
<Doc xmlns='urn:SomethingSomething3'>
    <level2>
        <level3>
            <level4>
                <level5>
                    <level6>
                        <level7>
                            <level8>
                                <level9>
                                    <level10>Content at the deepest level</level10>
                                </level9>
                            </level8>
                        </level7>
                    </level6>
                </level5>
            </level4>
        </level3>
    </level2>
</Doc>

I use lxml to read it and parse it like this

tree = etree.parse("textxml.xml")
root = tree.getroot()

if I do pretty print from root it will show the entire xml. which is good but when I try to read specific tags values like so

content = root.xpath('//level10/text()')

xpath can't find any tag below the root and returns empty list I suspect it's because of the namespaces but can't find a solution to make xpath read values any advice ?


Solution

  • Add xmlns {urn:SomethingSomething3} to the tag you want to search:

    from lxml import etree
    
    xml_data = """
    <MyXmlRoot>
        <App xmlns='urn:SomethingSomething1'>
        </App>
        <User xmlns='urn:SomethingSomething2'>
        </User>
        <Doc xmlns='urn:SomethingSomething3'>
            <level2>
                <level3>
                    <level4>
                        <level5>
                            <level6>
                                <level7>
                                    <level8>
                                        <level9>
                                            <level10>Content at the deepest level</level10>
                                        </level9>
                                    </level8>
                                </level7>
                            </level6>
                        </level5>
                    </level4>
                </level3>
            </level2>
        </Doc>
    </MyXmlRoot>
    """
    
    root = etree.fromstring(xml_data)
    
    level10_text = root.find(".//{urn:SomethingSomething3}level10").text
    print("Text from <level10> tag:", level10_text)
    

    Prints:

    Text from <level10> tag: Content at the deepest level
    

    OR: Use etree.ETXPath:

    to_search = etree.ETXPath("//{urn:SomethingSomething3}level10/text()")
    level10_text = to_search(root)
    print("Text from <level10> tag:", level10_text)
    

    Prints:

    Text from <level10> tag: ['Content at the deepest level']