Search code examples
c++cintprotocol-buffersflatbuffers

Is there a portable Binary-serialisation schema in FlatBuffers/Protobuf that supports arbitrary 24bit signed integer definitions?


We are sending data over UART Serial at a high data rate so data size is important. The most optimal format is Int24 for our data which may be simplified as a C bit-field struct (GCC compiler) under C/C++ to be perfectly optimal:

#pragma pack(push, 1)
struct Int24
{
    int32_t value : 24;
};
#pragma pack(pop)

typedef std::array<Int24,32> ArrayOfInt24;

This data is packaged with other data and shared among devices and cloud infrastructures. Basically we need to have a binary serialization which is sent between devices of different architecture and programming languages. We would like to use a Schema based Binary serialisation such as ProtoBuffers or FlatBuffers to avoid the client codes needing to handle the respective bit-shifting and recovery of the twos-complement sign bit handling themselves. i.e. Reading the 24-bit value in a non-C language requires the following:

bool isSigned = (_b2 & (byte)0x80) != 0; // Sign extend negative quantities 
int32_t value = _b0 | (_b1 << 8) | (_b2 << 16) | (isSigned ? 0xFF : 0x00) << 24;

If not already existing which (if any) existing Binary Serialisation library could be modified easily to extend support to this as we would be willing to add to any open-source project in this respect.


Solution

  • Depending on various things, you might like to look at ASN.1 and the unaligned Packed Encoding Rules (uPER). This is a binary serialisation that is widely used in telephony to easily minimise the number of transmitted bits. Tools are available for C, C++, C#, Java, Python (I think they cover uPER). A good starting point is Useful Old Technologies.

    One of the reasons you might choose to use it is that uPER likely ends up doing better than anything else out there. Other benefits are contraints (on values and array sizes). You can express these in your schema, and the generated code will check data against them. This is something that can make a real difference to a project - automatic sanitisation of incoming data is a great way of resisting attacks - and is something that GPB doesn't do.

    Reasons not to use it are that the very best tools are commercial, and quite pricey. Though there are some open source tools that are quite good but not necessarily implementing the entire ASN.1 standard (which is vast). It's also a learning curve, though (at a basic level) not so very different to Google Protocol Buffers. In fact, at the conference where Google announced GPB, someone asked "why not use ASN.1?". The Google bod hadn't heard of it; somewhat ironic, a search company not searching the web for binary serialisation technologies, went right ahead and invented their own...