Search code examples
xpathsaxon

XPath only append text if there is a match


In Java 17 I'm using XPath to extract data from XML by joining all the <bar>s under <foo>. I'm using Saxon 12, but I'm doing it through the JAXP API. I create an XPathExpression and then invoke it like this:

(String)xpathExpression.evaluate(context, XPathConstants.STRING)

I was hoping that this would give me a null if there was no match. But apparently this is not the case. Let's start with this XPath expression (simplified from what I'm using)

/foo/string-join(bar, codepoints-to-string(10))

I wanted this to join all the /foo/bar strings together, separated by newlines, which it does if there is a /foo. But if there is no /foo, then instead of returning null it seems to return an empty string.

My first question would be how to detect that this XPath expression did not match /foo/. I had assumed that XPathExpression.evaluate() would return null if there was no match. (Reading the API now I guess that was just an assumption I made.)

But let's say that I'm OK with returning an empty string, and I can detect if the returned string is empty and consider that a non-match (even though semantically that is not ideal). The problem is that I want the value to end with a newline as well, so my expression looks like this:

concat(/foo/string-join(bar, codepoints-to-string(10)), codepoints-to-string(10))

This is worse—now if there is no /foo, it returns a string with a single newline \n, because it appends the newline to the thing-that-did-not-match which it considered the empty string.

I would prefer to find a way for this expression to return null in JAXP if /foo does not exist. But if that can't easily be done, I'd prefer to still at least get an empty string if /foo does not exist, i.e. concat() only appends text if the inner match is successful. I have a feeling I'll have to construct some elaborate work around, but maybe an XPath expert knows a trick or two.


Solution

  • When you use the JAXP interface with XPath 2.0, you run into the problem that the JAXP specification doesn't say what happens when the expression returns values outside the XPath 1.0 type system. So Saxon does its best to interpret the intent.

    If there is no foo element then the XPath expression returns an empty sequence. JAXP says that the raw result is converted to the required return type using XPath conversion rules. Now, in XPath 2.0 the string() function applied to an empty sequence returns a zero-length string, while the xs:string() constructor returns an empty sequence, which one might (perhaps) interpret as equivalent to a Java null. But Saxon chooses the string() conversion and returns a zero length string.

    My advice would be to switch to the s9api interface which gives you full access to the XPath 2.0 type system. I would probably write an expression that returns a sequence of strings, and write the code to convert this into a single string in Java rather than in XPath.

    But if you want to stick with JAXP, you could use the XPath expression

    string-join(/foo/(bar || '\n'))
    

    (Note, the \n is converted to a newline by the Java compiler, not by the XPath engine).