Search code examples
protocol-buffers

When to use fixed value protobuf type? Or under what scenarios?


I want to transfer a serialized protobuf message over TCP and I've tried to use the first field to indicate the total length of the serialized message.

I know that the int32 will change the length after encoding. So, maybe a fixed32 is a good choice.

But at last of the Encoding chapter, I found that I can't depend on it even if I use a fixed32 with field_num #1. Because Field Order said that the order may change.

My question is when do I use fixed value types? Are there any example scenarios?


Solution

  • "My question is when do I use fixed value types?"

    When it comes to serializing values, there's always a tradeoff. If we look at the Protobuf-documentation, we see we have a few options when it comes to 32-bit integers:

    int32: Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.

    uint32: Uses variable-length encoding.

    sint32: Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.

    fixed32: Always four bytes. More efficient than uint32 if values are often greater than 2^28.

    sfixed32: Always four bytes.

    int32 is a variable-length data-type. Any information that is not specified in the type itself, needs to be expressed somehow. To deserialize a variable-length number, we need to know what the length is. That is contained in the serialized message as well, which requires additional storage space. The same goes for an optional negative sign. The resulting message may be smaller because of this, but may be larger as well.

    Say we have a lot of integers between 0 and 255 to encode. It would be cheaper to send this information as a two bytes (one byte with that actual value, and one byte to indicate that we just have one byte), than to send a full 32-bit (4 bytes) integer [fictional values, actual implementation may differ]. On the other hand, if we want to serialize a large value, that can only fit in 4 bytes the result may be larger (4 bytes and an additional byte to indicate that the value is 4 bytes; a total of 5 bytes). In this case it will be more efficient to use a fixed32. We simply know a fixed32 is 4 bytes; we don't need to serialize that fixed32 is a 4-byte number.

    And if we look at fixed32 it actually mentions that the tradeoff point is around 2^28 (for unsigned integers).

    So some types are good [as in, more efficient in terms of storage space] for large values, some for small values, some for positive/negative values. It all depends on what the actual values represent.

    "Are there any example scenarios?"

    32-bit hashes (ie: CRC-32), IPv4 addresses/masks. A predictable message sizes could be relevant.