I have a bitset which i am using to track whether an item is present or not example
b = 01100110000
it represents that 2nd and 3rd items are present and 1st and 4th item are not present.
While searching for library which can optimise this bitset array. I came across Roaring bitmaps which sounded very exciting.
I did a quick test with it,
public static void main(String[] args) throws IOException {
RoaringBitmap roaringBitMap = new RoaringBitmap();
BitSet bitSet = new BitSet(5000);
double prob = 0.001;
Random random = new Random();
for (int i = 0; i < 5000; i++) {
if (random.nextDouble() < prob) {
bitSet.set(i);
roaringBitMap.add(i);
}
}
System.out.println(bitSet.cardinality());
System.out.println("bitset bytes: "+ bitSet.size());
System.out.println("RoaringBitmap bytes: " + roaringBitMap.getSizeInBytes() * 8);
}
Basically we are setting some values and check overall size of data structure.
when we run this with multiple prob values. I got
prob byte | bitset bytes | RoaringBitmap bytes |
---|---|---|
0.001 | 5056 | 288 |
0.01 | 5056 | 944 |
0.1 | 5056 | 7872 |
0.999 | 5056 | 65616 |
If you see as we insert more and more numbers, the memory footprint of RoaringBitmap increases.
this seems to be the case when number of entries are small, But as we increase the number of entries, the different becomes less visible. Although it is not confirmed by the lib author ( i asked here and followed up here)
prob | number of entries | bitset bits | RoaringBitmap bits | saving % |
---|---|---|---|---|
0.001 | 50000 | 50048 | 928 | 98 |
0.01 | 50000 | 50048 | 7744 | 84 |
0.1 | 50000 | 50048 | 65616 | -31 |
0.999 | 50000 | 50048 | 65616 <- NOTE it does not increase | -31 |
0.001 | 500000 | 500032 | 8704 | 98 |
0.01 | 500000 | 500032 | 80720 | 83 |
0.1 | 500000 | 500032 | 524480 | -4 |
0.999 | 500000 | 500032 | 524480 <- NOTE it does not increase | -4 |
0.001 | 50000000 | 50000000 | 835232 | 98 |
0.01 | 50000000 | 50000000 | 8036368 | 83 |
0.1 | 50000000 | 50000000 | 50016240 | -0.03 |
0.999 | 50000000 | 50000000 | 50016240 <- NOTE it does not increase | -0.03 |
looking at this it seems like as number of entries grows they might be using bitmap only behind the scene. The take away is that don't blindly use the library, test for your use case.