Search code examples
javaencodingutf-8character-encodingnio

Reading file with bad encoding. CP1252 vs UTF-8


I have byte array, which put in InputStreamReader and do some manipulations with it.

Reader reader = new InputStreamReader(new ByteArrayInputStream(byteArr));

JVM has default cp1252 encoding, but file, which I translating to byte array has utf-8 encoding. Also this file has german umlauts. And when I put byte array in InputStreamReader, java decode umlauts to wrong symbols. For example ü represent as ü. I'm tried to put "UTF-8" and Charset.forName("UTF-8").newDecoder()); to InputStreamReader constructor, translate strings from reader to string with new encoding via new String(oldStr.getBytes("cp1252"), "UTF-8); but it's not helped. In debugger in reader variable I see StreamDecoder parameter, which has "decoder" with MS1252$Decoder value. Maybe It's solving of my problem, but I not understand, how I can fix it.


Solution

  • Try to use InputStreamReader(InputStream in, String charsetName) constructor and set charset by yourself.

    Reader reader = new InputStreamReader(new ByteArrayInputStream(byteArr), "UTF-8");