Search code examples
javabitset

BitSet from byte[] with strange lenght


my code is :

String blah = "blah";
byte[] blahBytes = blah.getBytes("US-ASCII");
System.out.println(Arrays.toString(blahBytes));
BitSet set = BitSet.valueOf(blahBytes);
System.out.println(set.length());

the output is :

[98, 108, 97, 104]
31

Why is length() returning 31? Shouldn't it be 32?


Solution

  • Bit set length is determined by the position of the highest bit set to 1. Since all bytes that you pass to construct bit set represent ASCII character subset of UNICODE, the 8-th bit is always zero. Therefore, the highest bit set to 1 will be either bit 30 or bit 31, depending on the letter or digit in the end of your string: if you pass "bla1" instead of "blah" you would get 30 (demo 1). If you use control characters, such as <TAB> you could get an even shorter bit set of 28 (demo 2).

    If you would like to get a length rounded up to the next multiple of 8, use

    int roundedLength = 8 * ((set.length() + 7) / 8);
    

    demo 3