Search code examples
javaxmlnewlinew3ccdata

How to preserve newlines in CDATA when generating XML?


I want to write some text that contains whitespace characters such as newline and tab into an xml file so I use

Element element = xmldoc.createElement("TestElement");
element.appendChild(xmldoc.createCDATASection(somestring));

but when I read this back in using

Node vs =  xmldoc.getElementsByTagName("TestElement").item(0);
String x = vs.getFirstChild().getNodeValue();

I get a string that has no newlines anymore.
When i look directly into the xml on disk, the newlines seem preserved. so the problem occurs when reading in the xml file.

How can I preserve the newlines?

Thanks!


Solution

  • I don't know how you parse and write your document, but here's an enhanced code example based on yours:

    // creating the document in-memory                                                        
    Document xmldoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
    
    Element element = xmldoc.createElement("TestElement");                                    
    xmldoc.appendChild(element);                                                              
    element.appendChild(xmldoc.createCDATASection("first line\nsecond line\n"));              
    
    // serializing the xml to a string                                                        
    DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();             
    
    DOMImplementationLS impl =                                                                
        (DOMImplementationLS)registry.getDOMImplementation("LS");                             
    
    LSSerializer writer = impl.createLSSerializer();                                          
    String str = writer.writeToString(xmldoc);                                                
    
    // printing the xml for verification of whitespace in cdata                               
    System.out.println("--- XML ---");                                                        
    System.out.println(str);                                                                  
    
    // de-serializing the xml from the string                                                 
    final Charset charset = Charset.forName("utf-16");                                        
    final ByteArrayInputStream input = new ByteArrayInputStream(str.getBytes(charset));       
    Document xmldoc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(input);
    
    Node vs =  xmldoc2.getElementsByTagName("TestElement").item(0);                           
    final Node child = vs.getFirstChild();                                                    
    String x = child.getNodeValue();                                                          
    
    // print the value, yay!                                                                  
    System.out.println("--- Node Text ---");                                                  
    System.out.println(x);                                                                    
    

    The serialization using LSSerializer is the W3C way to do it (see here). The output is as expected, with line separators:

    --- XML --- 
    <?xml version="1.0" encoding="UTF-16"?>
    <TestElement><![CDATA[first line
    second line ]]></TestElement>
    --- Node Text --- 
    first line
    second line