Search code examples
javautf-8jsoup

Jsoup.Element.text() not correctly encoding utf-8


I am doing my project in eclipse, JDK 1.8. My client recently add a new request to enable the saving and retrieval in Arabic letters too. I am have added useUnicode=true&characterEncoding=UTF-8 in jdbc url. Now saving the data works correctly and I am getting the response in UTF-8 encoded form. it is working fine. for that I have added

path = "/v2",consumes="application/json;charset=UTF-8", produces = "application/json;charset=UTF-8"

In all my controllers. I have an Api to generate labels in which I am using jsoup to edit html template. And then converting to pdf using wkhtmltopdf library. This fuction is working correctly if iam using english,

org.jsoup.nodes.Document doc = Jsoup.parse(template, "UTF-8", "");
Element customerName = doc.getElementById("name");
customerName.text(orderAddress.getName());

if orderAddress.getName() is in Arabic I am getting ????? I just tried to print in console is also getting the same.logger.debug("Name:"+orderAddress.getName());
Eclipse is enabled to use utf-8. I also tried to use like this

customerName.text(new String(orderAddress.getName().getBytes(),"UTF-8"));
logger.debug("Name:"+new String(orderAddress.getName().getBytes(),"UTF-8"));

also getting same. in my unit testing I tried to use like this customerName.text("فاسيلة");working correctly. and generating pdf exactly what I needed.

I have seen few questions similar to this but not none of them solved my issue. Since GET is working fine , I am sure about retrieving data from DB is not an issue. Since unit testing working fine encoding in that end also working fine. Now i am missing something related to jsoup. What is I am missing in my attempt? Some one know please help me.


Solution

  • I have used used UTF-8 wile writing the string to outpuut and solved

    FileUtils.writeStringToFile(tempHTML, doc.outerHtml(), "UTF-8");

    No need to change the encoding to "ISO-8859-9" keep as Jsoup.parse(template, "UTF-8", "");