Search code examples
protocol-buffersparquetprotobuf-java

ProtoParquetWriter don't write falses, 0s and empty strings


In the following example:

  try (ParquetWriter<Example> writer =
        new ProtoParquetWriter<>(
            new Path("file:/tmp/foo.parquet"),
            Example.class,
            SNAPPY,
            DEFAULT_BLOCK_SIZE,
            DEFAULT_PAGE_SIZE)) {
      writer.write(
          Example.newBuilder()
              .setTs(System.currentTimeMillis())
              .setTenantId("tenant")
              .setSomeFlag(false)
              .setSomeInt(1)
              .setOtherInt(0)
              .build());
    }
  }

And example .proto file:

syntax = "proto3";
package com.example;

message Example {
  uint64 ts = 1;
  string tenantId = 2;
  bool someFlag = 3;
  int32 someInt = 4;
  int32 otherInt = 2;
}

The resulting parquet file won't have the fields someFlag and otherInt because they are false and 0 respectively.

Is there a way to make it write it anyway or should I handle this on the reader side?


Solution

  • In proto3, presence tracking was not enabled historically, and the only presence rule was around zero defaults. Fortunately this changed recently in new versions of protoc. The optional keyword can now be used in from of fields in proto3 to enable this. So: add optional, and any compliant implementation should do what you want. The defaults are still zero/false/etc, but if they are explicitly set: they are serialized.

    syntax = "proto3";
    package com.example;
    
    message Example {
      optional uint64 ts = 1;
      optional string tenantId = 2;
      optional bool someFlag = 3;
      optional int32 someInt = 4;
      optional int32 otherInt = 2; // [sic]
    }
    

    Also, the second 2 should be a 5