Search code examples
xsltsaxon

Correct way to retrieve resulting text as string with Saxon XSLT library


I'm using an XSLT 3 template with <xsl:output method="text"/>, which extracts some lines of text from an XML source document. The template is very particular, producing the individual lines and even the newlines (LF) in the right places.

Invoking the Saxon HE 12.2 JAR with Java 17 from the command line, I verify that the output text is precisely what I'm looking for, suitable for a .txt file.

The next step is to do the same thing programmatically, so I followed the documentation for using the s9api for transformations. Since I had used <xsl:output method="text"/> I assumed that an XSLT processor would output only text. Instead it appears that transformer.applyTemplates(new StreamSource(xmlInputStream)) will produce an XdmValue, itself which is a series of XdmItems.

Investigating further, it seems that each XdmItem wraps an XdrNode of kind TEXT! (I see that this mirrors the DOM's text nodes.) There is a text node for each output of the stylesheet, including a separate node for each newline which the output, e.g. from <xsl:text>&#10;</xsl:text> in the template.

As I mentioned I had assumed that <xsl:output method="text"/> would have made the transformer skip the XML world altogether and simply output the text to a text buffer. I imagined some sort of produceText(String) method, similar to Hadoop MapReduce emitting values, which would be collected immediately to a buffer without the need to wrap them each in any sort of node. But I guess the XML foundation still presents itself to some extent, even in "text" output mode.

To me these nodes seem like needless overhead, as <xsl:output method="text"/> plainly indicates I don't need XML output at all. Maybe for historical reasons it's unavoidable. In any case, I understand that I can extract the text using this:

String text = xdmValue.stream().map(XdmItem::getStringValue).collect(joining());

My question is simply: is this the most efficient way to extract XSLT text output using Saxon, or is there a simpler, more direct way that skips the intermediate overhead of XdmNode items?


Solution

  • There is an overload of the applyTemplates method (https://www.saxonica.com/html/documentation12/javadoc/net/sf/saxon/s9api/Xslt30Transformer.html#applyTemplates(net.sf.saxon.s9api.XdmValue,net.sf.saxon.s9api.Destination) writing to a destination like a Serializer (over a stream or file or writer ) that I would suggest to use if you want Saxon to serialize the transformation result based on your xsl:output declarations.