c++gcc bit-manipulation bit-fields bitset

96-bit long bitfield with non-octet aligned subfields

I need a 96-bit long structure that I can place custom bit fields into. The fields' lengths are all over the place, 8, 3, 26, 56. It's important that they remain these exact lengths (with one exception, see below).

I see numerous ways of concatenating the data into a single, compact field: std::bitset, structs (holding the fields contiguously), and of course just using ints. However:

The bitset approach is problematic, because operations need to happen really fast: bitset doesn't provide a method to instantly set range (x..y) out of the whole range (0..96), with one atomic operation. Damned if I'm going to loop to set individual bits.
The struct approach is problematic, because of this restriction on length.
The int approach is problematic, because int64_t is not long enough. I can of course use an int32_t alongside this, however see below.

One solution that is obvious is to put the 56 + 8 fields into an int64_t, and the rest into an int32_t. The problem here is that the 56-long field is the only one which may in fact be reduced later on in development, and that will mean I will have some spare bits in the int64_t, and some 32 - (26 + 3) = 3 spare bits in the int32_t.

Are there any ways to store these as compactly as possible (from a code standpoint) while still being able to access broad areas by masking (unlike std::bitset)?

Solution

Ok, you have a classic size vs speed situation here. I'm going to ask, is this a situation where every bit does matter? Is it that big of a deal if a couple of bits are not quite used? The C coder in me likes either an array of 3 32-bit values, or the 64-bit, 32-bit value approach. The optimizer in me doesn't like the fact that 96-bit data structures are not completely cache friendly and would rather be padded to 128-bits, or at least not accessed across 4-byte boundaries as much as possible.

Using a 64-bit value, depending on your target platform, allows masking of the entire 56-bit entry in 1 instructions, while the 32-bit version would require at least 2 operations. But if you could get that value down to 32-bits (or up to 64-bits), then, no masking at all and full speed ahead provided you keep that 64-bit value on 64-bit address boundaries. Some targets will allow you to access the data non-align at a penalty whereas others will actually throw exceptions.

The safest way is the array of 3 32-bit values. Your alignment is guaranteed, the math can be kept simple as long as you don't span 32-bit boundaries with your bitfields, and it will be the most portable. If you must span a boundary, you will take a speed hit with extra masking and shifting. But, and this is the big question, are your really, Really sure that accessing this data is a speed concern? You have a profile in hand showing that this is a bottleneck? If not, I'd just go with the bitfield C++ solution and call it good. Safer and easier to use is pretty much always a win.