Search code examples
javacharacter-encodingcharacter-arrays

char operations on byte arrays in Java


I have a byte array which "contains" text - the encoding/charset is unkown at this time.

How can I remove whitespace, \n, \r characters, of course without creating a String object from the byte array ?

The goal is to display the byte array as text, with a charset specified by the user, just without these whitespace, \n, \r characters.


Solution

  • I have a byte array which "contains" text - the encoding/charset is unkown at this time.

    If you don't know the encoding, then there is simply no concept of whitespace, \r, \n characters etc. Those characters could map to any bytes.

    You must determine the encoding before you can reliably perform any text-based operations. Until then, you simply don't have text.

    So basically you need to reorder your steps: ask the user for the encoding, then convert the byte array into text, then remove whitespace (e.g. with a regular expression).