Search code examples
javamemory-managementbitset

Is BitSet worth using?


I made a Java object that has a lot of Boolean fields. I was considering using BitSet when I started questioning its usefulness.

Of course, one would use it for memory reasons, since a boolean is 8 bits alone, 4 in an array. With BitSet, each value is stored as a single bit. However, wouldn't the memory saved be blown out of the water by the following overhead?

  • BitSet class and method definitions meta data (per runtime)
  • The objects needed as keys to semantically retrieve the values (per class using BitSet)
  • The meta data for the bits array in BitSet (per instance)

versus using booleans:

  • boolean value (per instance)

Let's take a look at the following class:

private boolean isVisible; // 8 bits per boolean * 82 booleans = ~0.6Kb
// 81 lines later...
private boolean isTasty;

// ...

public boolean isVisible() { return isVisible; }
// ...
public boolean isTasty() { return isTasty; }

public void setVisible(boolean newVisibility) { isVisible = newVisibility; }
// ...
public void setTasty(boolean newTastiness) { isTasty = newTastiness; }

Now, if I were to combine all my booleans into one BitSet and still keep my code semantic, I might do this:

private static final int _K_IS_VISIBLE = 1; // 32 bits per key * 82 keys = ~2.5Kb
// ...
private static final int _K_IS_TASTY = 82;
private BitSet bools = new BitSet(82); // 2 longs = 64b

// ...

public boolean isVisible() { return bools.get(_K_IS_VISIBLE); }
// ...
public boolean isTasty() { return bools.get(_K_IS_TASTY); }

public void setVisible(boolean newVisibility) { bools.set(_K_IS_VISIBLE, newVisibility); }
// ...
public void setTasty(boolean newTastiness) { bools.set(_K_IS_TASTY, newTastiness); }

tl;dr

costOfUsingBitSet =
    bitSetMethodsAndClassMetaData + // BitSet class overhead
    (numberOfKeysToRetrieveBits * Integer.SIZE) + // Semantics overhead
    (numberOfBitSetsUsed * floor((bitsPerBitSet / Long.SIZE) + 1)); // BitSet internal array overhead

and possibly more. Whereas using booleans would be:

costOfBooleans = 
    (numberOfBooleansOutsideArrays * 8) + 
    (numberOfBooleansInsideArrays * 4);

I feel like the overhead of BitSet is much higher. Am I right?


Solution

  • Nice space comparison here between boolean[] and BitSet:

    https://www.baeldung.com/java-boolean-array-bitset-performance

    chart Think they swapped labels here. Should be more bits per memory (Blue) in BitSet.

    The key takeaway here is, the BitSet beats the boolean[] in terms of the memory footprint, except for a minimal number of bits.

    An alternative in your example is to use 2 long as bit flags.

    class A {
    // 1st set
    private static final long IS_VISIBLE_MASK = 1; 
    ...
    private static final long IS_DARK_MASK = 1 << 63 ; 
    
    // 2nd set...
    private static final long IS_TASTY_MASK = 1;
    ...
    
    // IS_VISIBLE_MASK .. IS_DARK_MASK
    long data1;
    
    // IS_TASTY_MASK ...
    long data2;
    
    boolean isDark = (data1 & IS_DARK_MASK) != 0; 
    
    }
    

    Limitations

    BitSet comes with silly limitations, as you can reach a max of Integer.MAX_VALUE bits. I needed as much bits as I could store in RAM. So modified the original implementation in two ways:

    • It waste less computing for fixed sized LongBitSets (i.e. user specifies length at construction time).
    • it can reach the last bit in the biggest possible word array.

    Added details on limitations in this thread