Search code examples
c++portabilityendiannessunionsbit-fields

Portable bit fields for Handles


I want to use and store "Handles" to data in an object buffer to reduce allocation overhead. The handle is simply an index into an array with the object. However I need to detect use-after-reallocations, as this could slip in quite easily. The common approach seems to be using bit fields. However this leads to 2 problems:

  1. Bit fields are implementation defined
  2. Bit shifting is not portable across big/little endian machines.

What I need:

  • Store handle to file (file handler can manage either integer types (byte swapping) or byte arrays)
  • Store 2 values in the handle with minimum space

What I got:

template<class T_HandleDef, typename T_Storage = uint32_t>
struct Handle
{
    typedef T_HandleDef HandleDef;
    typedef T_Storage Storage;

    Handle(): handle_(0){}
private:
    const T_Storage handle_;
};

template<unsigned T_numIndexBits = 16, typename T_Tag = void>
struct HandleDef{
    static const unsigned numIndexBits = T_numIndexBits;
};

template<class T_Handle>
struct HandleAccessor{
    typedef typename T_Handle::Storage Storage;
    typedef typename T_Handle::HandleDef HandleDef;

    static const unsigned numIndexBits = HandleDef::numIndexBits;
    static const unsigned numMagicBits = sizeof(Storage) * 8 - numIndexBits;

    /// "Magic" struct that splits the handle into values
    union HandleData{
        struct
        {
            Storage index : numIndexBits;
            Storage magic : numMagicBits;
        };
        T_Handle handle;
    };
};

A usage would be for example:

typedef Handle<HandleDef<24> > FooHandle;
FooHandle Create(unsigned idx, unsigned m){
    HandleAccessor<FooHandle>::HandleData data;
    data.idx = idx;
    data.magic = m;
    return data.handle;
}

My goal was to keep the handle as opaque as possible, add a bool check but nothing else. Users of the handle should not be able to do anything with it but passing it around.

So problems I run into:

  • Union is UB -> Replace its T_Handle by Storage and add a ctor to Handle from Storage
  • How does the compiler layout the bit field? I fill the whole union/type so there should be no padding. So probably the only thing that can be different is which type comes first depending on endianess, correct?
  • How can I store handle_ to a file and load it from a possible different endianess machine and still have index and magic be correct? I think I can store the containing Storage 'endian-correct' and get correct values, IF both members occupy exactly half the space (2 Shorts in an uint) But I always want more space for the index than for the magic value.

Note: There are already questions about bitfields and unions. Summary:

  • Bitfields may have unexpected padding (impossible here as whole type occupied)
  • Order of "members" depend on compiler (only 2 possible ways here, should be save to assume order depends entirely on endianess, so this may or may not actually help here)
  • Specific binary layout of bits can be achieved by manual shifting (or e.g. wrappers http://blog.codef00.com/2014/12/06/portable-bitfields-using-c11/) -> Is not an answer here. I need also a specific layout of the values IN the bitfield. So I'm not sure what I get, if I e.g. create a handle as handle = (magic << numIndexBits) | index and save/load this as binary (no endianess conversion) Missing a BigEndian machine for testing.

Note: No C++11, but boost is allowed.


Solution

  • Answer is pretty simple (based on another question I forgot the link to and comments by @Jeremy Friesner ):

    As "numbers" are already an abstraction in C++ one can be sure to always have the same bit representation when the variable is in a CPU register (when it is used for anything calculation like) Also bit shifts in C++ are defined in an endian-independent way. This means x << 1 is always equal x * 2 (and hence big-endian) Only time one get endianess problems is when saving to file, send/recv over network or accessing it from memory differently (e.g. via pointers...)

    One cannot use C++ bitfields here, as one cannot be 100% sure about the order of the "entries". Bitfield containers might be ok, if they allow access to the data as a "number".

    Savest is (still) using bitshifts, which are very simple in this case (only 2 values) During storing/serialization the number must then be stored in an endian-agnostic way.