Search code examples
integerprotocol-buffersdata-representationvarint

Why is varint an efficient data representation?


I am currently studying the documentation of protocol buffers. Varints are described as:

Each byte in a varint, except the last byte, has the most significant bit (msb) set – this indicates that there are further bytes to come. The lower 7 bits of each byte are used to store the two's complement representation of the number in groups of 7 bits, least significant group first.

My question is why one would choose a representation that looses one bit on every byte? What are the benefits from this approach?


Solution

  • It's to save space/bandwidth e.g. many programming languages and protocols have a fixed sized data types. e.g. an uint8_t, an uint16_t, uint32_t etc. and these takes up a fixed amount of bytes regardless of how big the value is. e.g. if you store the value 2 in an uint32_t, that takes up 4 bytes.

    With an encoding such as varint used in protobuf, small values can take up a smaller space, and the value 2 only needs 1 byte of space to be transferred on the wire, while still being flexible enough to not restrict the range of the value that can be used.

    This is often a net win if small values are more common than big values - which is often the case.