Search code examples
pretty-printtransformer-model

Java XML Pretty Print with Commented blocks


I have below code to pretty print a given XML.

public void prettyPrintXML(String xmlString) {
        try {
            Source xmlInput = new StreamSource(new StringReader(xmlString));
            StringWriter stringWriter = new StringWriter();
            StreamResult xmlOutput = new StreamResult(stringWriter);
            TransformerFactory transformerFactory = TransformerFactory.newInstance();
            Transformer transformer = transformerFactory.newTransformer();
            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
            transformer.transform(xmlInput, xmlOutput);
            System.out.println("OutPutXML : ");
            System.out.println(xmlOutput.getWriter().toString());
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

Here is an input and output of the above code:

InputXML :
<employees><employee><name>John</name><age>18</age></employee><!--employee><name>Smith</name><age>27</age></employee--></employees>

OutPutXML : 
<?xml version="1.0" encoding="UTF-8"?>
<employees>
    <employee>
        <name>John</name>
        <age>18</age>
    </employee>
    <!--employee><name>Smith</name><age>27</age></employee-->
</employees>

I need to get the commented block in above output in below format

<!--employee>
   <name>Smith</name>
   <age>27</age>
</employee-->

Is there a way to do this in Java without using any external libraries?


Solution

  • No, this is not supported out of the box using the standard libraries. Getting that kind of behaviour requires tweaking a lot; parsing the comment as XML and inheriting the indentation level from the parent node. You also run the risk of mixing comments containing plain text with those containing XML.

    I have however implemented such a processor: xmlformatter. It also handles XML in text and CDATA nodes, and can do so robustly (i.e. does not fail on invalid XML within comments).

    From

    <parent><child><!--<comment><xml/></comment>--></child></parent>
    

    you'll get

    <parent>
        <child>
            <!--
            <comment>
                <xml/>
            </comment>-->
        </child>
    </parent>
    

    which I think will be a bit more readable than your desired output.