Search code examples
javaxmlpretty-print

Prettify XML in org.w3c.dom.Document to file


Summary: I want to save a org.w3c.dom.Document to file with nice indentation (pretty print it). The below code with a Transformer does the job in some cases, but not in all cases (see example). Can you help me fix this?

I have a org.w3c.dom.Document (not org.jdom.Document) and want to automatically format it nicely and print it into a file. How can I do that? I tried this, but it doesn't work if there are additional newlines in the document:

import java.io.ByteArrayInputStream;

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;

public class Main {
    public static void main(String[] args) {
        try {
            String input = "<asdf>\n\n<a>text</a></asdf>";
            Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(input.getBytes()));

            System.out.println("-- input -------------------\n" + input + "\n----------------------------");
            System.out.println("-- output ------------------");
            prettify(doc);
            System.out.println("----------------------------");

        } catch (Exception e) {}
    }

    public static void prettify(Document doc) {
        try {
            TransformerFactory transformerFactory = TransformerFactory.newInstance();
            Transformer transformer = transformerFactory.newTransformer();
            transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
            transformer.transform(new DOMSource(doc), new StreamResult(System.out));
        } catch (Exception e) {}
    }
}

I have directed the ouput to System.out so that you can run it easily wherever you want (for instance on Ideone.com). You can see, that the output is not pretty. If I remove the \n\n from the input string, everything is fine. And the document usually doesn't come from a string, but from a file and gets modified heavily before I want to prettify it.

This Transformer seems to be the right way, but I am missing something. Can you tell me, what I am doing wrong?

SSCCE output:

-- input -------------------
<asdf>

<a>text</a></asdf>
----------------------------
-- output ------------------
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<asdf>

<a>text</a>
</asdf>
----------------------------

Expected output:

-- input -------------------
<asdf>

<a>text</a></asdf>
----------------------------
-- output ------------------
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<asdf>    
  <a>text</a>
</asdf>
----------------------------

Solution

  • Try this:

    It needs org.apache.xml.serialize.XMLSerializer and org.apache.xml.serialize.OutputFormat ;

    OutputFormat format = new OutputFormat(document); //document is an instance of org.w3c.dom.Document
    format.setLineWidth(65);
    format.setIndenting(true);
    format.setIndent(2);
    Writer out = new StringWriter();
    XMLSerializer serializer = new XMLSerializer(out, format);
    serializer.serialize(document);
    
    String formattedXML = out.toString();