Search code examples
intintegerprotocol-buffersvarint

How do varints take less space?


I'm trying to learn about varints and the best thing I found is this Google Protocol Buffers spec.

In their example, they show that this number 1010 1100 0000 0010, when encoded with varints, is 300 as opposed to 44034.

Normally the number 300 takes two bytes (1 0010 1100), but so does the 300 in their example. How do varints actually take less bytes than normal ints?


Solution

  • 300 takes two bytes normally, if you're using a 2-byte fixed format to represent it. If you're using a 4-byte or 8-byte format, it takes 4 bytes or 8 bytes. If you're using a 1-byte format, you can't represent 300 at all, unless you're using a really weird encoding.

    If you want to use a variable-length encoding, then a standard 2's complement representation isn't enough, because that has no length information. You wouldn't have any idea where the number stops. You could encode the length separately, but that would increase your space requirement... or you could use something like protobuf varint representation, which can represent 300 in 2 bytes and includes a signal that the number is complete at that point.

    (Or you could use one of plenty of other encodings. Protobuf varints aren't unambiguously superior to other representations, even in terms of space used. There are always tradeoffs.)