I'm using the following code to transform a big xml stream to another stream:
import java.io.ByteArrayInputStream;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.io.Writer;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.events.XMLEvent;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXResult;
import javax.xml.transform.stax.StAXSource;
public class TryMe
{
public static void main (final String[] args)
{
XMLInputFactory inputFactory = null;
XMLEventReader eventReaderXSL = null;
XMLEventReader eventReaderXML = null;
XMLOutputFactory outputFactory = null;
XMLEventWriter eventWriter = null;
Source XSL = null;
Source XML = null;
inputFactory = XMLInputFactory.newInstance();
outputFactory = XMLOutputFactory.newInstance();
inputFactory.setProperty("javax.xml.stream.isSupportingExternalEntities", Boolean.TRUE);
inputFactory.setProperty("javax.xml.stream.isNamespaceAware", Boolean.TRUE);
inputFactory.setProperty("javax.xml.stream.isReplacingEntityReferences", Boolean.TRUE);
try
{
eventReaderXSL = inputFactory.createXMLEventReader("my_template",
new InputStreamReader(TryMe.class.getResourceAsStream("my_template.xsl")));
eventReaderXML = inputFactory.createXMLEventReader("big_one", new InputStreamReader(
TryMe.class.getResourceAsStream("big_one.xml")));
}
catch (final javax.xml.stream.XMLStreamException e)
{
System.out.println(e.getMessage());
}
// get a TransformerFactory object
final TransformerFactory transfFactory = TransformerFactory.newInstance();
// define the Source object for the stylesheet
try
{
XSL = new StAXSource(eventReaderXSL);
}
catch (final javax.xml.stream.XMLStreamException e)
{
System.out.println(e.getMessage());
}
Transformer tran2 = null;
// get a Transformer object
try
{
tran2 = transfFactory.newTransformer(XSL);
}
catch (final javax.xml.transform.TransformerConfigurationException e)
{
System.out.println(e.getMessage());
}
// define the Source object for the XML document
try
{
XML = new StAXSource(eventReaderXML);
}
catch (final javax.xml.stream.XMLStreamException e)
{
System.out.println(e.getMessage());
}
// create an XMLEventWriter object
try
{
eventWriter = outputFactory.createXMLEventWriter(new OutputStreamWriter(System.out));
}
catch (final javax.xml.stream.XMLStreamException e)
{
System.out.println(e.getMessage());
}
// define the Result object
final Result XML_r = new StAXResult(eventWriter);
// call the transform method
try
{
tran2.transform(XML, XML_r);
}
catch (final javax.xml.transform.TransformerException e)
{
System.out.println(e.getMessage());
}
// clean up
try
{
eventReaderXSL.close();
eventReaderXML.close();
eventWriter.close();
}
catch (final javax.xml.stream.XMLStreamException e)
{
System.out.println(e.getMessage());
}
}
}
my_template is something like this:
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:preserve-space elements="*"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="@k8[parent::point]">
<xsl:attribute name="k8">
<xsl:value-of select="'xxxxxxxxxxxxxx'"/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
and xml is a long long list of
<data>
<point .... k8="blablabla" ... ></point>
<point .... k8="blablabla" ... ></point>
<point .... k8="blablabla" ... ></point>
....
<point .... k8="blablabla" ... ></point>
</data>
If i use an identity transformer (using tranfsFactory.newTransformer() instead of transFactory(XSL) ) while the input stream is processed the output is produced. Instead with my template there's no way.. The transformer reads all the input and then starts to produce the output (with a large stream of course very often an out of memory comes before a result.
Any Idea?? i'm freaking out.. i can't understand what's wrong in my code/xslt
Many thanks in advance!!
Using XSLT is probably not the best approach, as others have pointed out your solution requires that the processor reads the entire document into memory before writing out the output. You might wish to consider using a SAX parser to sequentially read in each node, perform any transformation required (using a data driven mapping if necessary) and write out the transformed data. This avoids the requirement to create an entire document tree in memory and could enable significantly faster processing as you're not attempting to build a complex document to write out.
Ask yourself if the output format is simple and stable, and then reconsider the use of XSLT. For large datasets of regular data, you might also wish to consider if XML is a good file format for transferring information.