Search code examples
javacharbytetrim

How to trim a String by bytes?


I have got a UTF-8 text and I want trim / truncate it by bytes so that a I get a new String of a costumed length of Bytes.

public static String trimByBytes(String text, int longitudBytes) throws Exception {

    byte bytes_text[] = text.getBytes("UTF-8");
    int negativeBytes = 0;

    byte byte_trimmed[] = new byte[longitudBytes];
    if (byte_trimmed.length <= bytes_text.length) {
          //copy  array manually and count negativeBytes
        for (int i = 0; i < byte_trimmed.length; i++) {
            byte_trimmed[i] = bytes_text[i];
            if (byte_trimmed[i] < 0) {

                negativeBytes++;
            }
        }
         //if negativeBytes are odd
        if (negativeBytes % 2 != 0 && byte_trimmed[byte_trimmed.length - 1] < 0) {
            byte_trimmed[byte_trimmed.length - 1] = 0;//delete last

        }
    }else{
      for (int i = 0; i < bytes_text.length; i++) {
            byte_trimmed[i] = bytes_text[i];
        }

    }
    return new String(byte_trimmed);
}

}

e.g.

  • nomenclature: String trimByBytes(String str, int lengthOfBytes); trimByBytes(Gómez ,1)
  • Gómez is 6 bytes length (but 5 chars length)
  • Gómez trimmed at 3 is Gó ok Gómez trimmed at 2 is G� but I want G (remove odd character)
  • Gómez trimmed at 1 is G ok Gómez trimmed at 8 is G Gómez

Solution

  • Create an explicit CharsetDecoder, and set CodingErrorAction.IGNORE on it.

    Since a CharsetDecoder works with ByteBuffers, applying the length limit is as easy as calling the ByteBuffer’s limit method:

    String trimByBytes(String str, int lengthOfBytes) {
        byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
        ByteBuffer buffer = ByteBuffer.wrap(bytes);
    
        if (lengthOfBytes < buffer.limit()) {
            buffer.limit(lengthOfBytes);
        }
    
        CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
        decoder.onMalformedInput(CodingErrorAction.IGNORE);
    
        try {
            return decoder.decode(buffer).toString();
        } catch (CharacterCodingException e) {
            // We will never get here.
            throw new RuntimeException(e);
        }
    }