I have an complex object that holds million of int
int[] ints = new int[1000000]
If I save that values directly via ByteBuffer
it's file size is about 5MB
When I save that values to protocol buffer object,
It save each value not as int
but as Integer
.
Then when I save that byte[] stream to file It's file size is over than 8MB
It seems protocol buffer does not provide primitive array type.
Is there any way(or trick) to reduce the byte[] size of protocol buffer object that contains million of ints?
When I save that values to protocol buffer object
How exactly are you doing that? Normally, with protobuf, you define some type in a .proto schema; the obvious contender here would be:
syntax = "proto3";
message Whatever {
repeated int32 ints = 1;
}
In proto3 "packed" is considered the default when enabled, so this should use "packed" encoding, giving a size that is... well, slightly dependent on the data, since it uses "varint" encoding, but for 1000000 elements it could be anywhere between 1,000004 bytes and 10,000,004 (between 1 and 10 bytes per element, 1 byte for the field header, and 3 bytes for the length - 10 bytes per element usually means: negative numbers encoded as int32
).
If you know the values are often negative, or often large, you could choose to use sint32
(uses zig-zag encoding; avoids the 10-bytes for negative numbers) or sfixed32
(always uses 4 bytes per element) instead of int32
, but the "packed" should still apply.
In proto2, you need to opt-in for "packed":
syntax = "proto2";
message Whatever {
repeated int32 ints = 1 [packed=true];
}