What is the least memory intensive way to apply a stylesheet to an xml document?

Consider that I have an XML Document loaded as a byte[] that is 5MB in size. Being a byte array, it takes up exactly 5MB of memory. I have a stylesheet Source that I want to apply to this document and perform something like the below.

final TransformerFactory transformerFactory = TransformerFactory.newInstance();
final Transformer transformer = transformerFactory.newTransformer(styleSheet);

transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");

final StringWriter writer = new StringWriter();

transformer.transform(convertStringToSource(filePayload), new StreamResult(writer));

return writer.getBuffer().toString().getBytes();

When run on the server (WebSphere App Server 7 - with limits on contiguous memory allocation) I get heap dumps that indicate objects of 10 - 15 MB are created. I presume the transform() method will create an object internally, store the original xml as an object, the stylesheet as an object, and the result as an object. Add those together and I'm at a minimum of 2*input+stylesheet MB. Is there a more efficient way to do this, that keeps my footprint to a minimum?

You might say - it's only 10MB, but in my case, performance is critical. The time it takes to allocate that much contiguous memory adds up when I have to transform hundreds or thousands of documents at a time. Thus our server admins have this limit set as a warning of sorts that more memory is being allocated than recommended.

FYI, the following JVM parameter sets this in WebSphere: -Xdump:stack:events=allocation,filter=#5m.

Solution

A factor of 3 expansion between the raw XML size and the size of the in-memory tree is certainly normal; in fact it's low. See for example http://dev.saxonica.com/blog/mike/2012/09/

Streamed transformation is starting to become possible for a limited class of transformations. See for example http://www.saxonica.com/documentation/sourcedocs/streaming.xml. But when your documents are only 5Mb in size, I'm not sure it's the right approach for you, at least not without further evidence.

It seems to me that you have come to the conclusion that memory allocation by the XSLT processor is the critical factor affecting the performance of your workload without any real evidence that this is the case. It would be interesting to see, for example, what the transformation time is in relation to the parsing time - many people are surprised that sometimes the transformation cost is tiny compared to the parsing cost. Before addressing one aspect of your system performance, you need to work out what the true bottlenecks are.