Search code examples
javaxsltnamespacesjaxbsaxon

Writing XSLT transformed XML fragments to a XMLStreamWriter


I've got the following problem:

  • Large output file (zip), which contains one xml document ("FeatureCollection")
  • Relatively small xml fragments
  • Each fragment requires to be written as "featureMember" to the XMLStream after XSLT transformation
  • Namespace definitions only on "FeatureCollection" (root) tag.

Now, I got this to work by making use of a separate byte-stream for parsing the fragments. I also wrap the XMLStream to avoid that the XSLT transformer (Saxon) opens / closes a document or closes the stream.

However, I feel that the solution is too complicated. It should be possible to take the JAXB context as source (not having an intermediate byte stream). See code-snippet:

        try {
            XMLStreamWriterWrapper writer = getWriter( xmlFile );
            for ( Map.Entry<String, String> entry : prefixMapper.getNamespaces().entrySet() ) {
                writer.setPrefix( entry.getValue(), entry.getKey() );
            }

            writer.getWrapperWriter().writeStartDocument();
            writer.writeStartElement( GML_URI, "FeatureCollection" );

            for ( Map.Entry<String, String> entry : prefixMapper.getNamespaces().entrySet() ) {
                writer.getWrapperWriter().writeNamespace( entry.getValue(), entry.getKey() );
            }

            while ( dtoIterator.hasNext() ) {
                writer.writeStartElement( GML_URI, "featureMember" );
                D dto = dtoIterator.next();
                hideAttributes( dto );

                J jaxb = transformToJaxb( dto );

                Source untransformed = new JAXBSource( jaxbContext, getRootElement( jaxb ) );
                getTransformer().transform( untransformed, new StAXResult( writer) );
                writer.writeEndElement();
            }

            writer.writeEndElement();
            writer.getWrapperWriter().writeEndDocument();
            writer.getWrapperWriter().flush();
            writer.getWrapperWriter().close();
        }
        catch ( IOException | JAXBException | TransformerException | XMLStreamException e ) {
            LOG.error( e );
            throw new IllegalArgumentException( e );
        }
        
private XMLStreamWriterWrapper getWriter( File xmlFile ) throws XMLStreamException, FileNotFoundException, IOException {
    XMLOutputFactory xof = XMLOutputFactory.newFactory();
    xof.setProperty( XMLOutputFactory.IS_REPAIRING_NAMESPACES, Boolean.TRUE );

    XMLStreamWriter writer = xof.createXMLStreamWriter( new BufferedOutputStream( new FileOutputStream( xmlFile ) ) );

    return new XMLStreamWriterWrapper( writer );
}           

The expected result (from non optimized solution):

    <?xml version="1.0" ?><gml:FeatureCollection xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:brocom="http://www.broservices.nl/xsd/brocommon/3.0" xmlns:bro="http://www.pdok.nl/bro">
    <gml:featureMember>
        <bro:Characteristics gml:id="BRO_id_1">
            <brocom:broId>id_1</brocom:broId>
        </bro:Characteristics>
    </gml:featureMember>
    <gml:featureMember>
        <bro:Characteristics gml:id="BRO_id_2">
            <brocom:broId>id_2</brocom:broId>
        </bro:Characteristics>
    </gml:featureMember>
</gml:FeatureCollection>

However the result (from code snippet above) is:

    <?xml version="1.0" ?><gml:FeatureCollection xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:brocom="http://www.broservices.nl/xsd/brocommon/3.0" xmlns:bro="http://www.pdok.nl/bro">
    <gml:featureMember>
        <bro:Characteristics gml:id="BRO_id_1">
            <broId xmlns="http://www.broservices.nl/xsd/brocommon/3.0">id_1</broId>
        </bro:Characteristics>
    </gml:featureMember>
    <gml:featureMember>
        <bro:Characteristics gml:id="BRO_id_2">
            <broId xmlns="http://www.broservices.nl/xsd/brocommon/3.0">id_2</broId>
        </bro:Characteristics>
    </gml:featureMember>
</gml:FeatureCollection>

Questions:

  1. the XMLStreamWriter seems to ignore the property IS_REPAIRING_NAMESPACES. What is wrong?
  2. Can I optimize the Saxon transformer so that it operates on partial xml. Ergo: do I really need to wrap the XMLOutputStream so that the transformer does not write an open/close document or close the stream altogether?
  3. Am I defining the namespaces correctly (with setPrefix and writeNameSpace).
  4. When using a JAXB marshaller, I can set properties on the marshaller like: JAXB_FORMATTED_OUTPUT, JAXB_FRAGMENT. Can I do this is this solution as well?

Solution

  • Note that you could use a Saxon implementation of XMLStreamWriter in place of the one you are using (Processor.newSerializer().getXMLStreamWriter()). It's possible this might give you more control and perhaps solve the namespace issues.

    Rather than supplying new StaxResult(writer) as the second argument of transform(), you could try supplying new net.sf.saxon.stax.ReceiverToXMLStreamWriter(writer), and you could then perhaps subclass ReceiverToXMLStreamWriter so that the startDocument() and endDocument() calls do nothing.

    As regards XMLStreamWriter handling of namespaces, I'm afraid the API specification is very obscure. I found it helpful to consult http://veithen.github.io/2009/11/01/understanding-stax.html although it has no official standing. I can't offer any guarantee that the Saxon interpretation is what the authors of the API intended (there's no reference implementation or test suite).