Search code examples
javacharacter-encodinginputstreamreader

What's the charset of text returned by InputStreamReader(InputStream in, Charset cs)


I read a UTF-8 file by:

br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), Charset.forName("UTF-8")));

I would like to know what's the charset of returned String after I invoke br.readLine()?

Eclipse on my Computer uses "GBK" as default charset.


Solution

  • Technically, the file is been read using a charset of UTF-8 as you told the InputStreamReader to do so. The underlying bytes of the file content are been interpreted using UTF-8. The readLine() method returns a String which stores the characters internally in Java's own UTF-16 charset.

    What happens thereafter is fully dependent on what you're doing with this String. If you're writing it back to a file using a Writer without specifying the charset, then the platform's default will be used. If you're displaying it to the stdout, then the stdout's default charset will be used which is dependent on the runtime environment (command console? IDE? etc). If you're saving it in a database, then it's dependent on the JDBC driver configuration and/or the DB table encoding. Etcetera.

    Apparently you're printing it to stdout in Eclipse's console by System.out.println(). In that case, the GBK charset will be used to display the characters. That would malform any originally read UTF-8 characters which are not covered by GBK. You'd need to configure Eclipse to use UTF-8 as text file encoding. That can be done by Window > Preferences > General > Workspace > Text file encoding.