Search code examples
javaxpathxpath-2.0jaxp

JAXP XPath 1.0 or 2.0 - how to distinguish empty strings from non-existent values


Given the following XML instance:

<entities>
    <person><name>Jack</name></person>
    <person><name></name></person>
    <person></person>
</entities>

I am using the following code to: (a) iterate over the persons and (b) obtain the name of each person:

XPathExpression expr = xpath.compile("/entities/person");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0 ; i < nodes.getLength() ; i++) {
    Node node = nodes.item(i);
    String innerXPath = "name/text()";
    String name  = xpath.compile(innerXPath).evaluate(node);
    System.out.printf("%2d -> name is %s.\n", i, name);
}

The code above is unable to distinguish between the 2nd person case (empty string for name) and the 3rd person case (no name element at all) and simply prints:

0 -> name is Jack.
1 -> name is .
2 -> name is .

Is there a way to distinguish between these two cases using a different innerXPath expression? In this SO question it seems that the XPath way would be to return an empty list, but I 've tried that too:

String innerXPath = "if (name) then name/text() else ()";

... and the output is still the same.

So, is there a way to distinguish between these two cases with a different innerXPath expression? I have Saxon HE on my classpath so I can use XPath 2.0 features as well.

Update

So the best I could do based on the accepted answer is the following:

XPathExpression expr = xpath.compile("/entities/person");                                                                                                                                                                                 
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);                                                                                                                                                                   
for (int i = 0 ; i < nodes.getLength() ; i++) {                                                                                                                                                                                           
    Node node = nodes.item(i);                                                                                                                                                                                                            
    String innerXPath = "name";                                                                                                                                                                                                           
    NodeList names = (NodeList) xpath.compile(innerXPath).evaluate(node, XPathConstants.NODESET);                                                                                                                                         
    String nameValue = null;                                                                                                                                                                                                              
    if (names.getLength()>1) throw new RuntimeException("impossible");                                                                                                                                                                    
    if (names.getLength()==1)                                                                                                                                                                                                             
        nameValue = names.item(0).getFirstChild()==null?"":names.item(0).getFirstChild().getNodeValue();                                                                                                                                  
    System.out.printf("%2d -> name is [%s]\n", i, nameValue);                                                                                                                                                                             
} 

The above code prints:

0 -> name is [Jack]
1 -> name is []
2 -> name is [null]

In my view this is not very satisfactory as logic is spread in both XPath and Java code and limits the usefulness of XPath as a host language and API-agnostic notation. My particular use case was to just keep a collection of XPaths in a property file and evaluate them at runtime in order to obtain the information I need without any ad-hoc extra handling. Apparently that's not possible.


Solution

  • The JAXP API, being based on XPath 1.0, is pretty limited here. My instinct would be to return the Name element (as a NodeList). So the XPath expression required is simply "Name". Then cases 1 and 2 will return a nodelist of length 1, while case 3 will return a nodelist of length 0. Cases 1 and 2 can then easily be distinguished within the application by getting the value of the node and testing whether it is zero-length.

    Using /text() is always best avoided anyway, since it causes your query to be sensitive to the presence of comments in the XML.