c++performance bit-manipulation bit-fields

Bit compressed structure

I am currently working on project, where I need to store quite large number (~units of billions) of structures in vector. I also need to iterate over that vector in linear manner, so the less data I have to go through, the better.

Therefore I naturally started to optimize the size of single structure. If, for example, I have multiple bool values, I can store the true/false value in single bit and compress all the bool values into one char/16-bit, whatever has sufficient size. For some of the entries I need only 20-bit unsigned integer. Therefore I can again compress these values.

I then end-up with something like this (note that this is only simplified example):

class Foo {
private:
    uint32_t m_time;
    uint32_t m_comb;

public:
    Foo(uint32_t t, uint32_t big, uint16_t small, bool is_blue, bool is_nice)
        : m_time(t), m_comb((big << 12) | (small << 2) | (is_blue << 1) | is_nice)
    { }

    uint32_t get_time()  const { return m_time; }
    uint32_t get_big()   const { return m_comb >> 12; }
    uint16_t get_small() const { return m_comb & 0b11111111100; }
    uint16_t is_blue()   const { return m_comb & 0b10; }
    uint16_t is_nice()   const { return m_comb & 0b1; }
};

The question is, can this be somehow automated using templates? My idea was that I would insert the order of entries, required bit size and than I would be able to call get<i>() which would return me the i-th entry of structure. My motivation is to get rid of writing the code manually since tickling bits seems to me quite error-prone. I tried to implement this my self but failed hopelessly.

Solution

You can implement this very easily with bit fields.

class Foo {
private:
    uint32_t m_time;
    uint32_t m_big : 20;
    uint32_t m_small : 10;
    uint32_t m_isblue : 1;
    uint32_t m_isnice : 1;
public:
    Foo(uint32_t t, uint32_t big, uint16_t small, bool is_blue, bool is_nice)
        : m_time(t), m_big(big), m_small(small), m_isblue(is_blue), m_isnice(is_nice)
    { }

    uint32_t get_time()  const { return m_time; }
    uint32_t get_big()   const { return m_big; }
    uint16_t get_small() const { return m_small; }
    uint16_t is_blue()   const { return m_isblue; }
    uint16_t is_nice()   const { return m_isnice; }
};

Online demo showing size.

Edit: additional infos

To summarize the comments, the way the compiler packs the bit fields together is implementation dependent:

9.6/1 (...) Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit. [Note: Bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. —end note ]

So you have no guarantee, but compilers usually do there best to put them together. As a rule of a thumb, as long as your bit fields are consecutive and their total bits smaller than the base type you are using, there are big chances that the packing will be optimal. If needed you could fine tune the base type, as this online demo shows.