Search code examples
javautf-8gb2312

Converting utf8 to gb2312 in java


Just look at the code bellow

try {
        String str = "上海上海";
        String gb2312 = new String(str.getBytes("utf-8"), "gb2312");
        String utf8 = new String(gb2312.getBytes("gb2312"), "utf-8");
        System.out.println(str.equals(utf8));
    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    }

print false!!!

I run this code both under jdk7 and jdk8 and my code style of IDE is utf8.

Can anyone help me?


Solution

  • what you are looking for is the encoding/decoding when you output/input.

    as @kalpesh said, internally, it is all unicode. if you want to READ a stream in a specific encoding and then WRITE it to a different one, you will have to specify the encoding for the conversion between bytes (in the stream) and strings (in java), and then between strings (in java) to bytes (the output stream) like so:

            InputStream is = new FileInputStream("utf8_encoded_text.txt");
            OutputStream os = new FileOutputStream("gb2312_encoded.txt");
    
            Reader r = new InputStreamReader(is,"utf-8");
            BufferedReader br = new BufferedReader(r);
            Writer w = new OutputStreamWriter(os, "gb2312");
            BufferedWriter bw = new BufferedWriter(w);
    
            String s=null;
            while((s=br.readLine())!=null) {
                bw.write(s);
            }
            br.close();
            bw.close();
            os.flush();
    

    of course, you still have to do proper exception handling to make sure everything is properly closed.