Search code examples
pythonxmlxpathjenkinselementtree

Access text of next sibling


Here is a part of a jenkins xml file.

I want to extract the defaultValue of project_name with xpath.

I this case the value is *****.

<?xml version='1.0' encoding='UTF-8'?>
<project>
    <properties>
        <hudson.model.ParametersDefinitionProperty>
            <parameterDefinitions>
                <hudson.model.StringParameterDefinition>
                    <name>customer_name</name>
                    <description></description>
                    <defaultValue>my_customer</defaultValue>
                </hudson.model.StringParameterDefinition>
                <hudson.model.StringParameterDefinition>
                    <name>project_name</name>
                    <description></description>
                    <defaultValue>*****</defaultValue>
                </hudson.model.StringParameterDefinition>
            </parameterDefinitions>
        </hudson.model.ParametersDefinitionProperty>
    </properties>
 </project>

I use etree of python, but AFAIK this does not matter much since this is a xpath question.

My current xpath knowledge is limited. My current approach:

for name_tag in config.findall('.//name'):
    if name_tag.text=='project_host':
        default=name_tag.getparent().findall('defaultValue')[0].text

Here I get AttributeError: 'Element' object has no attribute 'getparent'

I thought about this again, and I think that looping in python is the wrong approach. This should be selectable via xpath.


Solution

  • The XPath answer to your question is

    /project/properties/hudson.model.ParametersDefinitionProperty/parameterDefinitions/hudson.model.StringParameterDefinition[name = 'project_name']/defaultValue/text()
    

    which will select as the only result

    *****
    

    Given that your actual document does not have a namespace. You do not need to access the parent element nor a sibling axis.

    Even etree should support this kind of XPath expressions, but it might not - see the comment by har07.


    I thought about this again, and I think that looping in python is the wrong approach. This should be selectable via xpath.

    Yes, I agree. If you want to select a single value from a document, select it with an XPath expression and store it as a Python string directly, without looping through elements.


    Full example with lxml

    from lxml import etree
    from StringIO import StringIO
    
    document_string = """<project>
        <properties>
            <hudson.model.ParametersDefinitionProperty>
                <parameterDefinitions>
                    <hudson.model.StringParameterDefinition>
                        <name>customer_name</name>
                        <description></description>
                        <defaultValue>my_customer</defaultValue>
                    </hudson.model.StringParameterDefinition>
                    <hudson.model.StringParameterDefinition>
                        <name>project_name</name>
                        <description></description>
                        <defaultValue>*****</defaultValue>
                    </hudson.model.StringParameterDefinition>
                </parameterDefinitions>
            </hudson.model.ParametersDefinitionProperty>
        </properties>
     </project>"""
    
    tree = etree.parse(StringIO(document_string))
    
    result_list = tree.xpath("/project/properties/hudson.model.ParametersDefinitionProperty/parameterDefinitions/hudson.model.StringParameterDefinition[name = 'project_name']/defaultValue/text()")
    
    print result_list[0]
    

    Output:

    *****