Search code examples
javaxsltxslt-2.0saxon

Is it possible to create an XSL Transformer Output Stream?


Is it possible to create a "TransformerOutputStream", which extends the standard java.io.OutputStream, wraps a provided output stream and applies an XSL transformation? I can't find any combination of APIs which allows me to do this.

The key point is that, once created, the TransformerOutputStream may be passed to other APIs which accept a standard java.io.OutputStream.

Minimal usage would be something like:

java.io.InputStream in = getXmlInput();
java.io.OutputStream out = getTargetOutput();

javax.xml.transform.Templates templates = createReusableTemplates();        // could also use S9API
TransformerOutputStream tos = new TransformerOutputStream(out, templates);  // extends OutputStream

com.google.common.io.ByteStreams.copy(in, tos);

// possibly flush/close tos if required by implementation

That's a JAXP example, but as I'm currently using Saxon an S9API solution would be fine too.

The main avenue I've persued is along the lines of:

  • a class which extends java.io.OutputStream and implements org.xml.sax.ContentHandler
  • an XSL transformer based on an org.xml.sax.ContentHandler

But I can't find implementations of either of these, which seems to suggest that either no one else has ever tried to do this, there is some problem which makes it impractical, or my search skills just are not that good.

I can understand that with some templates an XML transformer may require access to the entire document and so a SAX content handler may provide no advantage, but there must also be simple transformations which could be applied to the stream as it passed through? This kind of interface would leave that decision up to the transformer implementation.

I have a written and am currently using a class which provides this interface, but it just collects the output data in an internal buffer then uses a standard JAXP StreamSource to read that on flush or close, so ends up buffering the entire document.


Solution

  • You could make your TransformerOutputStream extend ByteArrayOutputStream, and its close() method could take the underlying byte[] array, wrap it in a ByteArrayInputStream, and invoke a transformation with the input taken from this InputStream.

    But it seems you also want to avoid putting the entire contents of the stream in memory. So let's assume that the transformation you want to apply is an XSLT 3.0 streamable transformation. Unfortunately, although Saxon as a streaming XSLT transformer operates largely in push mode (by "push" I mean that the data supplier invokes the data consumer, whereas "pull" means that the data consumer invokes the data supplier), the first stage, of reading and parsing the input, is always in pull mode -- I don't know of an XML parser to which you can push lexical XML input.

    This means there's a push-pull conflict here. There are two solutions to a push-pull conflict. One is to buffer the data in memory (which is the ByteArrayOutputStream approach mentioned earlier). The other is to use two threads, with one writing to a shared buffer and the other reading from it. This can be achieved using a PipedOutputStream in the writing thread (https://docs.oracle.com/javase/8/docs/api/index.html?java/io/PipedOutputStream.html) and a PipedInputStream in the reading thread.

    Caveat: I haven't actually tried this, but I see no reason why it shouldn't work.

    Note that the topic of streaming in XSLT 3.0 is fairly complex; you will need to learn about it before you can make much progress here. I would start with Abel Braaksma's talk from XML London 2014: https://xmllondon.com/2014/presentations/braaksma