Struggling here to wrap HTML content in CData, using Java. The ultimate goal is transforming XML to HTML via XSLT. CData is a requirement. As such, I want the XSLT to ignore the HTML but I'm obviously doing something wrong since it's not preserving the HTML.
<?xml version="1.0" encoding="utf-8" ?>
<content>
<records>
<record>
<param1>1</param1>
<param2>25</param2>
<param3>34</param3>
<param4>b</param4>
<param5>
<p>this is html that should be wrapped with CData including the p tags.</p>
</param5>
</record>
</records>
</content>
Here is the code:
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse("test.xml");
doc.getDocumentElement().normalize();
Element param5 = (Element)doc.getElementsByTagName("param5").item(0);
CDATASection cdata = doc.createCDATASection(param5.getTextContent());
param5.appendChild(cdata);
DOMResult domResult = new DOMResult();
transform.setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, "param5");
transform.transform(new DOMSource(doc) , domResult);
So, for param5, the XML file, just before transformation resembles this:
<param5>
<![CDATA[
this is html that should be wrapped with CData including the p tags.
]]>
</param5>
When I want
<param5>
<![CDATA[
<p>this is html that should be wrapped with CData including the p tags.</p>
]]>
</param5>
I am lost as to what I'm doing wrong here.
Any help would be most appreciated. Thank you.
The XSL is very simple:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h1><xsl:value-of select="content/records/record/param5"/></h1>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Here is the sample HTML output that I need:
<html>
<body>
<h1>
<p>this is html that should be wrapped with CData including the p tags.</p>
</h1>
</body>
</html>
I'm trying not to over complicate things. The basic problem is I want CData to include both the HTML content and the HTML tags. getTextContent()
ignores the p tags. If there was a method that can grab everything inside param5, I'd be set.
If you want to create a CDATA section with the markup of DOM nodes then you first need to serialize those nodes which can be done in Java either using a default transformer or the DOM Load/Save API. So I would create a document fragment node and appendChild all child nodes of the param to the document fragment, the serialize the document fragment to a string then you can use your code to create a CDATA section and appendChild it to the param.
Here is a simple example, the import
s needed are
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.DocumentFragment;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
then the code to read in the document and find the element is as you posted and the DocumentFragment is used to assemble all child nodes removed from the element:
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
docFactory.setNamespaceAware(true);
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse("sample1.xml");
DocumentFragment frag1 = doc.createDocumentFragment();
Element param = (Element)doc.getElementsByTagName("param5").item(0);
while (param.hasChildNodes())
{
frag1.appendChild(param.getFirstChild());
}
then the LSSerializer
has a writeToString
method:
DOMImplementationLS lsImp = (DOMImplementationLS)doc.getImplementation();
LSSerializer ser = lsImp.createLSSerializer();
ser.getDomConfig().setParameter("xml-declaration", false);
String xml = ser.writeToString(frag1);
System.out.println(xml);
param.appendChild(doc.createCDATASection(xml));
System.out.println(ser.writeToString(doc));
The document then looks like
<content>
<records>
<record>
<param1>1</param1>
<param2>25</param2>
<param3>34</param3>
<param4>b</param4>
<param5><![CDATA[
<p>this is html that should be wrapped with CData including the p tags.</p>
]]></param5>
</record>
</records>
</content>
Someone at home in the Java world needs to tell you whether the cast to DOMImplementationLS lsImp = (DOMImplementationLS)doc.getImplementation();
is something reliable or whether you need to use the registry, as shown in http://www.java2s.com/Tutorial/Java/0440__XML/GeneratesaDOMfromscratchWritestheDOMtoaStringusinganLSSerializer.htm.