Search code examples
javaprotocol-buffersprotobuf-java

Protobuf deserialization speed up via hiding extra fields


Not too familiar with protobuf's implementation in java so I'm not certain if this will yield any benefits, but here is a thought I had.

Say for a large pb struct that looks like the following:

message data {
    required string field1 = 1;
    required int64 field2 = 2;
    optional string field3 = 3;
    ...
    required string field1000 = 1000;
}

The above example hopefully shows a combination of 1000 fields, which could be a mix of required and optional fields with a mix of datatypes.

If I were to only want, per say, field 1 and 3, and give the deserialize job the following:

message data {
    required string field1 = 1;
    optional string field3 = 3;
}

First of all, what would happen here, would the deserializer still attempt to parse all fields? Or would it know to only look for certain fields in their respective indexes.


Solution

  • The location of the data in memory is independent of the field number, so protobuf still needs to do some amount of parsing for each field. Removing fields will likely result in a benefit, but the gains depend on your field types.

    Unknown varint fields (int32, int64,...) need to be parsed entirely, but fixed size (float, double, fixed32,...) and length-delimited types (messages, strings, bytes) can be skipped efficiently.