Search code examples
javautf-8character-encodingutf-16

Decode bytes to chars one at a time


I have an arbitrary chunk of bytes that represent chars, encoded in an arbitrary scheme (may be ASCII, UTF-8, UTF-16). I know the encoding.

What I'm trying to do is find the location of the last new line (\n) in the array of bytes. I want to know how many bytes are left over after reading the last encoded \n.

I can't find anything in the JDK or any other library that will let me convert a byte array to chars one by one. InputStreamReader reads the stream in chunks, not giving me any indication how many bytes are getting read to produce a char.

Am I going to have to do something as horrible are re-encoding each char to figure out its byte length?


Solution

  • You can try something like this

        CharsetDecoder cd = Charset.forName("UTF-8").newDecoder();
        ByteBuffer in = ByteBuffer.wrap(bytes);
        CharBuffer out = CharBuffer.allocate(1);
        int p = 0;
        while (in.hasRemaining()) {
            cd.decode(in, out, true);
            char c = out.array()[0];
            int nBytes = in.position() - p;
            p = in.position();
            out.position(0);
        }