Search code examples
protocol-buffersproto

How does protobuf field ordering affect on-the-wire size?


I couldn't find super clear info on this in the docs or online (feel free to point me in the right direction though), but does the ordering of fields in a protobuf message (using syntax = proto3) affect the resulting on-the-wire size? For example, say I have the following message:

message Message {
    int x = 1;
    string y = 2;
}

If I specify this as

message Message {
    string y = 1;
    int x = 2;
}

will the resulting compiled on-the-wire size change for doing something like Message {x = 1, y = "foo"} under the two message specs differ or will they be the same?

How does this work with repeated fields? For example

message Msg {
    int x = 1;
    repeated string strs = 2;
}

vs

message Msg {
    repeated string strs = 1;
    int x = 2;
}

In this case is it more beneficial to put the repeated field first or the int field?


Solution

  • There is no impact based on order. A field header takes the same space at either location - it is not dependent on previous data. With repeated: the main relevant factor re space is whether primitives such as integers are "packed", but a: this doesn't apply to strings, and b: suitable repeated fields are packed by default in proto3 if they have enough elements to achieve a benefit (historically you had to elect in the proto schema for things to support packed layout).

    What can matter is the field number. Small field numbers take less space. Basically: find the bit length of the field number you want to use; add 3 bits (to encode the "wire type"); now divide by 7 (and add one if the remainder isn't zero, i.e. "round up"): that's the number of bytes you need to encode the field header. So: prefer fields 1, 2 and 3 over fields 100001, 100002 and 100003.