Search code examples
c++algorithmoptimizationbitsetbitarray

How to build N bits variables in C++?


I am dealing with very large list of booleans in C++, around 2^N items of N booleans each. Because memory is critical in such situation, i.e. an exponential growth, I would like to build a N-bits long variable to store each element.

For small N, for example 24, I am just using unsigned long int. It takes 64MB ((2^24)*32/8/1024/1024). But I need to go up to 36. The only option with build-in variable is unsigned long long int, but it takes 512GB ((2^36)*64/8/1024/1024/1024), which is a bit too much. With a 36-bits variable, it would work for me because the size drops to 288GB ((2^36)*36/8/1024/1024/1024), which fits on a node of my supercomputer.

I tried std::bitset, but std::bitset< N > creates a element of at least 8B. So a list of std::bitset< 1 > is much greater than a list of unsigned long int. It is because the std::bitset just change the representation, not the container.

I also tried boost::dynamic_bitset<> from Boost, but the result is even worst (at least 32B!), for the same reason.

I know an option is to write all elements as one chain of booleans, 2473901162496 (2^36*36), then to store then in 38654705664 (2473901162496/64) unsigned long long int, which gives 288GB (38654705664*64/8/1024/1024/1024). Then to access an element is just a game of finding in which elements the 36 bits are stored (can be either one or two). But it is a lot of rewriting of the existing code (3000 lines) because mapping becomes impossible and because adding and deleting items during the execution in some functions will be surely complicated, confusing, challenging, and the result will be most likely not efficient.

How to build a N-bits variable in C++?


Solution

  • How about a struct with 5 chars (and perhaps some fancy operator overloading as needed to keep it compatible to the existing code)? A struct with a long and a char probably won't work because of padding / alignment...

    Basically your own mini BitSet optimized for size:

    struct Bitset40 {
       unsigned char data[5];
       bool getBit(int index) {
         return (data[index / 8] & (1 << (index % 8))) != 0;
       }
       bool setBit(int index, bool newVal) {
         if (newVal) {
            data[index / 8] |= (1 << (index % 8));
         } else {
            data[index / 8] &= ~(1 << (index % 8));
         }
       }
    };
    

    Edit: As geza has also pointed out int he comments, the "trick" here is to get as close as possible to the minimum number of bytes needed (without wasting memory by triggering alignment losses, padding or pointer indirection, see http://www.catb.org/esr/structure-packing/).

    Edit 2: If you feel adventurous, you could also try a bit field (and please let us know how much space it actually consumes):

    struct Bitset36 {
      unsigned long long data:36;
    }