Search code examples
javaencodingcharacter-encodingfreemarkeriso-8859-15

FreeMarker special character output as question mark


I am trying to submit a form with fields containing special characters, such as €ŠšŽžŒœŸ. As far as I can see from the ISO-8859-15 wikipedia page, these characters are included in the standard. Even though the encoding for both request and response is set to the ISO-8859-15, when I am trying to display the values (using FreeMarker 2.3.18 in a JAVA EE environment), the values are ???????. I have set the form's accepted charset to ISO-8859-15, I have checked that the form is submitted with content-type text/html;charset=ISO-8859-15 (using firebug) but I can't figure out how to display the correct characters. If I am running the following code, the correct hex value is displayed (ex: Ÿ = be).

What am I missing? Thank you in advance!

System.out.println(Integer.toHexString(myString.charAt(i)));

EDIT:

I am having the following code as I process the request:

PrintStream ps = new PrintStream(System.out, true, "ISO-8859-15");
String firstName = request.getParameter("firstName");

// check for null before
for (int i = 0; i < firstName.length(); i++) {
     ps.println(firstName.charAt(i)); // prints "?"
}

BufferedWriter file=new BufferedWriter(new OutputStreamWriter(new FileOutputStream(path), "ISO-8859-15"));
file.write(firstName); // writes "?" to file (checked with notepad++, correct encoding set) 
file.close();

Solution

  • According to the hex value, the form data is submitted correctly. The problem seems to be related to the output. Java replaces a character with ? if it cannot be represented with the charset in use.

    You have to use a correct charset when constructing the output stream. What commands do you use for that? I do not know FreeMarker but there will probably be something like

    Writer out = new OutputStreamWriter(System.out);
    

    This should be replaced with something resembling this:

    Writer out = new OutputStreamWriter(System.out, "iso-8859-15");
    

    By the way, UTF-8 is usually much better choice for the encoding charset.