I read a UTF-8 file by:
br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), Charset.forName("UTF-8")));
I would like to know what's the charset of returned String
after I invoke br.readLine()
?
Eclipse on my Computer uses "GBK" as default charset.
Technically, the file is been read using a charset of UTF-8 as you told the InputStreamReader
to do so. The underlying bytes of the file content are been interpreted using UTF-8. The readLine()
method returns a String
which stores the characters internally in Java's own UTF-16 charset.
What happens thereafter is fully dependent on what you're doing with this String
. If you're writing it back to a file using a Writer
without specifying the charset, then the platform's default will be used. If you're displaying it to the stdout, then the stdout's default charset will be used which is dependent on the runtime environment (command console? IDE? etc). If you're saving it in a database, then it's dependent on the JDBC driver configuration and/or the DB table encoding. Etcetera.
Apparently you're printing it to stdout in Eclipse's console by System.out.println()
. In that case, the GBK charset will be used to display the characters. That would malform any originally read UTF-8 characters which are not covered by GBK. You'd need to configure Eclipse to use UTF-8 as text file encoding. That can be done by Window > Preferences > General > Workspace > Text file encoding.