I have an arbitrary chunk of bytes that represent chars, encoded in an arbitrary scheme (may be ASCII, UTF-8, UTF-16). I know the encoding.
What I'm trying to do is find the location of the last new line (\n) in the array of bytes. I want to know how many bytes are left over after reading the last encoded \n.
I can't find anything in the JDK or any other library that will let me convert a byte array to chars one by one. InputStreamReader
reads the stream in chunks, not giving me any indication how many bytes are getting read to produce a char.
Am I going to have to do something as horrible are re-encoding each char to figure out its byte length?
You can try something like this
CharsetDecoder cd = Charset.forName("UTF-8").newDecoder();
ByteBuffer in = ByteBuffer.wrap(bytes);
CharBuffer out = CharBuffer.allocate(1);
int p = 0;
while (in.hasRemaining()) {
cd.decode(in, out, true);
char c = out.array()[0];
int nBytes = in.position() - p;
p = in.position();
out.position(0);
}