Search code examples
protocol-buffersprotobuf-java

Smallest way to save java int[] with protocol buffer 3?


I have an complex object that holds million of int

int[] ints = new int[1000000]

If I save that values directly via ByteBuffer it's file size is about 5MB

When I save that values to protocol buffer object, It save each value not as int but as Integer. Then when I save that byte[] stream to file It's file size is over than 8MB

It seems protocol buffer does not provide primitive array type.

Is there any way(or trick) to reduce the byte[] size of protocol buffer object that contains million of ints?


Solution

  • When I save that values to protocol buffer object

    How exactly are you doing that? Normally, with protobuf, you define some type in a .proto schema; the obvious contender here would be:

    syntax = "proto3";
    message Whatever {
        repeated int32 ints = 1;
    }
    

    In proto3 "packed" is considered the default when enabled, so this should use "packed" encoding, giving a size that is... well, slightly dependent on the data, since it uses "varint" encoding, but for 1000000 elements it could be anywhere between 1,000004 bytes and 10,000,004 (between 1 and 10 bytes per element, 1 byte for the field header, and 3 bytes for the length - 10 bytes per element usually means: negative numbers encoded as int32).

    If you know the values are often negative, or often large, you could choose to use sint32 (uses zig-zag encoding; avoids the 10-bytes for negative numbers) or sfixed32 (always uses 4 bytes per element) instead of int32, but the "packed" should still apply.

    In proto2, you need to opt-in for "packed":

    syntax = "proto2";
    message Whatever {
        repeated int32 ints = 1 [packed=true];
    }