We are developing a set of C++ applications that exchange data through protobuf messages. One of the messages that we want to exchange contains a list of type-value pairs. The type is just an integer, the value can be a number of different data types, both basic ones like integer or string, but also more complex ones like ip addresses or prefixes. But for every specific type, there is only one data type allowed for the value.
type | value data type |
---|---|
1 | string |
2 | integer |
3 | list<ip_addr> |
4 | integer |
5 | struct |
6 | string |
... | ... |
Note: one of the communicating apps will ultimately encode this list of type-value pairs into a byte array in a network packet according to a fixed protocol format.
There are a few ways to encode this into a protobuf message, but we're currently leaning towards creating a protobof message for each type number separately:
message Type1
{
string value = 1;
}
message Type2
{
integer value = 1;
}
message Type3
{
repeated IpAddr value = 1;
}
...
message TVPair
{
oneof type
{
Type1 type_1 = 1;
Type1 type_2 = 2;
Type1 type_3 = 3;
...
}
}
message Foo
{
repeated TVPair tv_pairs = 1;
}
This is clear and easy to use for all applications and it hides the details of the network protocol encoding in the only app that actually needs to take care of it.
The only worry I have is that the list of Type numbers is in the order of a few 100 items. This means a few 100 protobuf messages need to be defined and the oneof
structure in the TVPair
message will contain that amount of members. I know the field numbers in protobuf messages can be a lot higher (~500.000.000) so that's not really an issue. But are there any downsides to having 100's of fields in a single protobuf message?
The comment from @DazWilkin pointed me towards some best practices in the protocol buffers documentation website:
Don’t Make a Message with Lots of Fields
Don’t make a message with “lots” (think: hundreds) of fields. In C++ every field adds roughly 65 bits to the in-memory object size whether it’s populated or not (8 bytes for the pointer and, if the field is declared as optional, another bit in a bitfield that keeps track of whether the field is set). When your proto grows too large, the generated code may not even compile (for example, in Java there is a hard limit on the size of a method ).
Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy. That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are a collection of small pieces, where each small piece is structured data. Even though Protocol Buffers cannot handle the entire set at once, using Protocol Buffers to encode each piece greatly simplifies your problem: now all you need is to handle a set of byte strings rather than a set of structures. Protocol Buffers do not include any built-in support for large data sets because different situations call for different solutions. Sometimes a simple list of records will do while other times you want something more like a database. Each solution should be developed as a separate library, so that only those who need it need pay the costs.
So although it might be technically possible, it is not advised to create big messages with lots of fields.