Search code examples
xmlxml-serializationxercesxalan

Why is Apache Xerces/Xalan adding additional carriage returns to my serialized output?


I'm using Apache Xerces 2.11.0 and Apache Xalan 2.7.1 and I'm having problems with additional carriage return characters in the serialized XML.

I have this (pseudo) code:

String myString = ...;
Document doc = ...;

Element item = doc.createElement("item");
item.appendChild(doc.createCDATASection(myString));

Transformer transformer = ...;
ByteArrayOutputStream stream = new ByteArrayOutputStream();
Result result = new StreamResult(stream);
transformer.transform(new DOMSource(document), result);

Now myString contains line breaks (\r\n), (actually it's base64 encoded data) but when I look at the serialized output, there are additional \r characters.

Input:

Line 1 \r\n
Line 2 \r\n
Line 3 \r\n

Output:

Line 1 \r\r\n
Line 2 \r\r\n
Line 3 \r\r\n

If I use createTextNode instead of createCDATASection the output becomes even more interesting:

Line 1 
\r\n
Line 2 
\r\n
Line 3 
\r\n

The additional character seems to be introduced during serialization, the DOM tree seems to be correct. (According to getTextContent())

Why is this happening? What can I do to fix this?


Solution

  • I guess your are having this problem on Windows and not on Linux/Solaris/Mac. Xalan serializer (org.apache.xml.serializer.ToStream.java) gets the line separator using System.getProperty("line.separator"). When the serializer writes \r\n, it interprets the \n as the end of line sequence and it actually writes \r+lineSeparator = \r\r\n. Although this sounds strange, this is not a bug, see [1]. But since this was frequently reported as a bug, a xalan extension property was added [2]. So you may programmatically set:

    transformer.setOutputProperty("{http://xml.apache.org/xalan}line-separator","\n");
    

    or

    <xsl:output xalan:line-separator="&#10;" />
    

    where xalan is a prefix associated with the URL "http://xml.apache.org/xalan".

    [1] https://issues.apache.org/jira/browse/XALANJ-1660

    [2] https://issues.apache.org/jira/browse/XALANJ-2093