Access text of next sibling

Here is a part of a jenkins xml file.

I want to extract the defaultValue of project_name with xpath.

I this case the value is *****.

<?xml version='1.0' encoding='UTF-8'?>
<project>
    <properties>
        <hudson.model.ParametersDefinitionProperty>
            <parameterDefinitions>
                <hudson.model.StringParameterDefinition>
                    <name>customer_name</name>
                    <description></description>
                    <defaultValue>my_customer</defaultValue>
                </hudson.model.StringParameterDefinition>
                <hudson.model.StringParameterDefinition>
                    <name>project_name</name>
                    <description></description>
                    <defaultValue>*****</defaultValue>
                </hudson.model.StringParameterDefinition>
            </parameterDefinitions>
        </hudson.model.ParametersDefinitionProperty>
    </properties>
 </project>

I use etree of python, but AFAIK this does not matter much since this is a xpath question.

My current xpath knowledge is limited. My current approach:

for name_tag in config.findall('.//name'):
    if name_tag.text=='project_host':
        default=name_tag.getparent().findall('defaultValue')[0].text

Here I get AttributeError: 'Element' object has no attribute 'getparent'

I thought about this again, and I think that looping in python is the wrong approach. This should be selectable via xpath.

Solution

The XPath answer to your question is

/project/properties/hudson.model.ParametersDefinitionProperty/parameterDefinitions/hudson.model.StringParameterDefinition[name = 'project_name']/defaultValue/text()

which will select as the only result

*****

Given that your actual document does not have a namespace. You do not need to access the parent element nor a sibling axis.

Even etree should support this kind of XPath expressions, but it might not - see the comment by har07.

I thought about this again, and I think that looping in python is the wrong approach. This should be selectable via xpath.

Yes, I agree. If you want to select a single value from a document, select it with an XPath expression and store it as a Python string directly, without looping through elements.

Full example with lxml

from lxml import etree
from StringIO import StringIO

document_string = """<project>
    <properties>
        <hudson.model.ParametersDefinitionProperty>
            <parameterDefinitions>
                <hudson.model.StringParameterDefinition>
                    <name>customer_name</name>
                    <description></description>
                    <defaultValue>my_customer</defaultValue>
                </hudson.model.StringParameterDefinition>
                <hudson.model.StringParameterDefinition>
                    <name>project_name</name>
                    <description></description>
                    <defaultValue>*****</defaultValue>
                </hudson.model.StringParameterDefinition>
            </parameterDefinitions>
        </hudson.model.ParametersDefinitionProperty>
    </properties>
 </project>"""

tree = etree.parse(StringIO(document_string))

result_list = tree.xpath("/project/properties/hudson.model.ParametersDefinitionProperty/parameterDefinitions/hudson.model.StringParameterDefinition[name = 'project_name']/defaultValue/text()")

print result_list[0]

Output:

*****