Search code examples
cportabilitybinaryfilesendiannesshtonl

Portable C binary serialization primitives


As far as I know, the C library provides no help in serializing numeric values into a non-text byte stream. Correct me if I'm wrong.

The most standard tool in use is htonl et al from POSIX. These functions have shortcomings:

  • There is no 64-bit support.
  • There is no floating-point support.
  • There are no versions for signed types. When deserializing, the unsigned-to-signed conversion relies on signed integral overflow which is UB.
  • Their names do not state the size of the datatype.
  • They depend on 8-bit bytes and the presence of exact-size uint_N_t.
  • The input types are the same as the output types, instead of referring to a byte stream.
    • This requires the user to perform a pointer typecast which is possibly unsafe in alignment.
    • Having performed that typecast, the user is likely to attempt to convert and output a structure in its native memory layout, a poor practice which results in unexpected errors.

An interface for serializing arbitrary-size char to 8-bit standard bytes would fall in between the C standard, which doesn't really acknowledge 8-bit bytes, and whatever standards (ITU?) set the octet as the fundamental unit of transmission. But the older standards aren't getting revised.

Now that C11 has many optional components, a binary serialization extension could be added alongside things like threads without placing demands on existing implementations.

Would such an extension be useful, or is worrying about non-two's-complement machines just that pointless?


Solution

  • I've never used them, but I think Google's Protocol Buffers satisfy your requirements.

    • 64 bit types, signed/unsigned, and floating point types are all supported.
    • The API generated is typesafe
    • Serialisation can be done to/from streams

    This tutorial seems like a pretty good introduction, and you can read about the actual binary storage format here.


    From their web page:

    What Are Protocol Buffers?

    Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages – Java, C++, or Python.

    There's no official implementation in pure C (only C++), but there are two C ports that might fit your needs:

    I don't know how they fare in the presence of non-8 bit bytes, but it should be relatively easy to find out.