I'm trying to decode a char ·
using charset GB2312 in java
this char contained in GB2312, the positional code is a1a4
check here
code:
public static void main(String[] _args) throws Exception {
String str="a1a4:· a5f6:ヶ a8c5:ㄅ";
ByteBuffer bf=readToByteBuffer(new ByteArrayInputStream(str.getBytes()));
System.out.println(Charset.forName("GB2312").decode(bf).toString());
}
private static final int bufferSize = 0x20000;
static ByteBuffer readToByteBuffer(InputStream inStream) throws IOException {
byte[] buffer = new byte[bufferSize];
ByteArrayOutputStream outStream = new ByteArrayOutputStream(bufferSize);
int read;
while (true) {
read = inStream.read(buffer);
if (read == -1)
break;
outStream.write(buffer, 0, read);
}
ByteBuffer byteData = ByteBuffer.wrap(outStream.toByteArray());
return byteData;
}
The code above output results for:
a1a4:? a5f6:ヶ a8c5:ㄅ
I don't understand why can't decode a1a4
?
In my browser, your string d
has its fifth character encoded as 0xB7
, which is MIDDLE DOT
, not KATAKANA MIDDLE DOT
. However, according to the same database you mentioned, that code point is not available in the GB2312 character set. Likewise, you can see that neither MIDDLE DOT
nor an encoding of 0xB7
are listed as being part of GB2312.
I think the problem here is with the characters in your input string, not in the CharsetDecoder
provided by your JRE.