Search code examples
javautf-8writer

Java's UTF-8 encoding


I have this code:

BufferedWriter w = Files.newWriter(file, Charsets.UTF_8);
w.newLine();
StringBuilder sb = new StringBuilder();
sb.append("\"").append("éééé").append("\";")
w.write(sb.toString());

But it ain't work. In the end my file hasn't an UTF-8 encoding. I tried to do this when writing:

w.write(new String(sb.toString().getBytes(Charsets.US_ASCII), "UTF8"));

It made question marks appear everywhere in the file...

I found that there was a bug regarding the recognition of the initial BOM charcater (http://bugs.java.com/view_bug.do?bug_id=4508058), so I tried using the BOMInputStream class. But bomIn.hasBOM() always returns false, so I guess my problem is not BOM related maybe?

Do you know how I can make my file encoded in UTF-8? Was the problem solved in Java 8?


Solution

  • You're writing UTF-8 correctly in your first example (although you're redundantly creating a String from a String)

    The problem is that the viewer or tool you're using to view the file doesn't read the file as UTF-8.

    Don't mix in ASCII, that just converts all the non-ASCII bytes to question marks.