Search code examples
javamemorymemory-managementjvmbitset

Is there anyway to store bits in Java without memory overhead?


This has caught my interest and attention since yesterday. I am trying to store bits in Java and hit by Memory Overhead.

My first question regarding same is What is size of my Bitset?

Based on the answers I looked at other references and found Memory Usage guide.

Then I looked at BitSet source code which looks like

public class BitSet implements Cloneable, java.io.Serializable {
    /*
     * BitSets are packed into arrays of "words."  Currently a word is
     * a long, which consists of 64 bits, requiring 6 address bits.
     * The choice of word size is determined purely by performance concerns.
     */
    private final static int ADDRESS_BITS_PER_WORD = 6;
    private final static int BITS_PER_WORD = 1 << ADDRESS_BITS_PER_WORD;
    private final static int BIT_INDEX_MASK = BITS_PER_WORD - 1;

    /* Used to shift left or right for a partial word mask */
    private static final long WORD_MASK = 0xffffffffffffffffL;

    /**
     * @serialField bits long[]
     *
     * The bits in this BitSet.  The ith bit is stored in bits[i/64] at
     * bit position i % 64 (where bit position 0 refers to the least
     * significant bit and 63 refers to the most significant bit).
     */
    private static final ObjectStreamField[] serialPersistentFields = {
        new ObjectStreamField("bits", long[].class),
    };

    /**
     * The internal field corresponding to the serialField "bits".
     */
    private long[] words;

    /**
     * The number of words in the logical size of this BitSet.
     */
    private transient int wordsInUse = 0;

    /**
     * Whether the size of "words" is user-specified.  If so, we assume
     * the user knows what he's doing and try harder to preserve it.
     */
    private transient boolean sizeIsSticky = false;

    /* use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = 7997698588986878753L;

    /**
     * Given a bit index, return word index containing it.
     */
    private static int wordIndex(int bitIndex) {
        return bitIndex >> ADDRESS_BITS_PER_WORD;
    }

.....
}

As per the calculation based on Memory Guide, this is what I calculated

8  Bytes: housekeeping space
12 Bytes: 3 ints
8  Bytes: long
12 Bytes: long[]
4  Bytes: transient int // does it count?
1  Byte : transient boolean
3  Bytes: padding

This sums to 45 + 3 bytes (padding to reach multiple of 8)

This means an empty BitSet itself reserves 48 bytes.

But my requirement was to store bits, What am I missing? What are my options here?

Thanks much

UPDATE

My requirement is that I want to store total of 64 bits in two separate fields

class MyClass{
    BitSet timeStamp
    BitSet id
}

and I want to store millions of MyClass objects in memory


Solution

  • This sums to 45 + 3 bytes (padding to reach multiple of 8) This means an empty BitSet itself reserves 48 bytes.

    First of all I want to advise the right tool to analyze object layout schemes in JVMs - JOL. In your case(java -jar jol-cli/target/jol-cli.jar internals java.util.BitSet) JOL produces the following result:

    Running 64-bit HotSpot VM.
    Using compressed references with 3-bit shift.
    Objects are 8 bytes aligned.
    Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
    Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
    
    java.util.BitSet object internals:
     OFFSET  SIZE    TYPE DESCRIPTION                    VALUE
          0     4         (object header)                01 00 00 00 (00000001 00000000 00000000 00000000) (1)
          4     4         (object header)                00 00 00 00 (00000000 00000000 00000000 00000000) (0)
          8     4         (object header)                f4 df 9f e0 (11110100 11011111 10011111 11100000) (-526393356)
         12     4     int BitSet.wordsInUse              0
         16     1 boolean BitSet.sizeIsSticky            false
         17     3         (alignment/padding gap)        N/A
         20     4  long[] BitSet.words                   [0]
    Instance size: 24 bytes (reported by Instrumentation API)
    Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
    

    Your calculations were not correct because of static fields, thus an empty BitSet itself reserves 24 bytes. Please note that these calculations are not 100% exact because it was not taken into account size of long[] object. So the right results are java -jar jol-cli/target/jol-cli.jar externals java.util.BitSet:

    Running 64-bit HotSpot VM.
    Using compressed references with 3-bit shift.
    Objects are 8 bytes aligned.
    Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
    Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
    
    java.util.BitSet@6b25f76bd object externals:
              ADDRESS       SIZE TYPE             PATH                           VALUE
            7ae321a48         24 java.util.BitSet                                (object)
            7ae321a60         24 [J               .words                         [0]
    

    This means an empty BitSet itself uses 48 bytes including long array. In order to optimize memory footprint you can write your own implementation of BitSet. For example in your use case following options are available:

    public class MyOwnBitSet {
        long word1;
        long word2;
    }
    
    public class MyOwnBitSet2 {
        long[] word = new long[2];
    }
    
    public class MyOwnBitSet3 {
        int index;
    }
    

    JOL produces the following result:

    MyOwnBitSet@443b7951d object externals:
              ADDRESS       SIZE TYPE                                                   PATH                           VALUE
            76ea4c7f8         32 MyOwnBitSet                                (object)
    
    
    MyOwnBitSet2@69663380d object externals:
              ADDRESS       SIZE TYPE                                                    PATH                           VALUE
            76ea53800         16 MyOwnBitSet2                                (object)
            76ea53810         32 [J                                                      .word                          [0, 0]
    
    
    MyOwnBitSet3@5a2e4553d object externals:
              ADDRESS       SIZE TYPE                                                    PATH                           VALUE
            76ea5c070         16 MyOwnBitSet3                                (object)
    

    Let me explain the last example MyOwnBitSet3. In order to decrease memory footprint you can preallocate a huge array of long/int objects and store only pointer on the right cell. With a sufficiently large number of objects, this option is the most advantageous.