Search code examples
javaxmlsax

SAX Transformer and end of line after <?xml ... ?>


To write my xml code I use the following code:

import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamWriter;
...
XMLOutputFactory xMLOutputFactory = XMLOutputFactory.newInstance();
XMLStreamWriter writer = xMLOutputFactory.createXMLStreamWriter(stringWriter);
writer.writeStartDocument("UTF-8", "1.0");
writer.writeCharacters("\n");
//I tried also writer.writeCharacters(System.getProperty("line.separator"));
writer.writeStartElement("settings");
...

To transform one line xml to multiline normal xml format I use the following code:

public String transform(final String xml) throws XMLStreamException, TransformerException {
    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
    Writer writer = new StringWriter();
    transformer.transform(new StreamSource(new StringReader(xml)), new StreamResult(writer));
    return writer.toString();
}

And this is the result

<?xml version="1.0" encoding="UTF-8"?><settings>
   ...
</settings>

As you see the <settings is on the first line. How can I make <settings> move to the second line to get the following result

<?xml version="1.0" encoding="UTF-8"?>
<settings>
   ...
</settings>

How to do it?


Solution

  • Let's assume you are using the built-in XSLT processor that comes with Java. That's an XSLT 1.0 processor, so we need to look to the XSLT 1.0 specification.

    This is what XSLT 1.0 says about indent="yes":

    If the indent attribute has the value yes, then the xml output method may output whitespace in addition to the whitespace in the result tree (possibly based on whitespace stripped from either the source document or the stylesheet) in order to indent the result nicely; if the indent attribute has the value no, it should not output any additional whitespace. The default value is no. The xml output method should use an algorithm to output additional whitespace that ensures that the result if whitespace were to be stripped from the output using the process described in [3.4 Whitespace Stripping] with the set of whitespace-preserving elements consisting of just xsl:text would be the same when additional whitespace is output as when additional whitespace is not output.

    That's all rather convoluted but the bottom line is that the processor MAY output a newline at the point where you want it, but is under no obligation to do so.

    If you use Saxon as your XSLT processor, then it does output a newline at this point.

    But you haven't said why this newline is so important to you. You describe not having it as a "problem", but why is it a problem? If you parse the generated document using a standard XML parser then any newline at this point will be ignored. There is one case where it makes a difference, namely if the XML you generate is used as an external parsed entity incorporated into some larger document. But for that case you definitely DON'T want the newline (which is perhaps why Xalan doesn't output it).

    NOTE: See also Remove space in between doctype in XML using XSLT where the user this time is complaining about newlines in the serialized output that in this case aren't wanted. If you care about such differences between alternative serializations of the same document, which won't affect the way any conformant parser handles the document, then (a) you're probably going to have to write your own serializer, (b) you're going to lose one of the major benefits of XML which is the availability of lots of conformant tools, and (c) you're doing something wrong: probably using a non-conformant parser (or no parser at all) to process the generated XML.