Ok after lot of search I decided to ask question here. Below is the sample code to reproduce my problem. The document object is build with chinese character.
String value= "𧀠";
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.newDocument();
Element root = doc.createElement("value");
root.setAttribute("attribute", value);
doc.appendChild(root);
DOMSource source = new DOMSource(doc);
I am trying to convert the document source to string using the Transformer class with the below code.
ByteArrayOutputStream outStream = null;
Transformer transformer = TransformerFactory.newInstance().newTransformer();
StreamResult htmlStreamResult = new StreamResult( new ByteArrayOutputStream() );
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(source, htmlStreamResult);
outStream = (ByteArrayOutputStream) htmlStreamResult.getOutputStream();
String outPut = outStream.toString( "UTF-8" );
But I got output with converted Chinese characters as below.
<?xml version="1.0" encoding="UTF-8" standalone="no"?><value attribute="𧀠"/>
I do not want the Chinese character to be converted but to be displayed as it is. Appreciate if anyone help me on this.
Change UTF-8
to UTF-16
. Since you're making a String
(which is code-page agnostic) this has no ill effect on the encoding. This however adds code-page declaration and sometimes a BOM (Byte-Order-Mark) in the XML header. You can optionally leave the header out and attach your own.
String value= "𧀠かな〜"; // (I don't see your character so I added some of my own)
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.newDocument();
Element root = doc.createElement("value");
root.setAttribute("attribute", value);
doc.appendChild(root);
DOMSource source = new DOMSource(doc);
ByteArrayOutputStream outStream = null;
Transformer transformer = TransformerFactory.newInstance().newTransformer();
StreamResult htmlStreamResult = new StreamResult( new ByteArrayOutputStream() );
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");
// transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); // optional
transformer.transform(source, htmlStreamResult);
outStream = (ByteArrayOutputStream) htmlStreamResult.getOutputStream();
String outPut = outStream.toString( "UTF-16" );
System.out.println(outPut);
Output:
<?xml version="1.0" encoding="UTF-16" standalone="no"?><value attribute="𧀠かな〜"/>