In the following example:
try (ParquetWriter<Example> writer =
new ProtoParquetWriter<>(
new Path("file:/tmp/foo.parquet"),
Example.class,
SNAPPY,
DEFAULT_BLOCK_SIZE,
DEFAULT_PAGE_SIZE)) {
writer.write(
Example.newBuilder()
.setTs(System.currentTimeMillis())
.setTenantId("tenant")
.setSomeFlag(false)
.setSomeInt(1)
.setOtherInt(0)
.build());
}
}
And example .proto
file:
syntax = "proto3";
package com.example;
message Example {
uint64 ts = 1;
string tenantId = 2;
bool someFlag = 3;
int32 someInt = 4;
int32 otherInt = 2;
}
The resulting parquet file won't have the fields someFlag
and otherInt
because they are false
and 0
respectively.
Is there a way to make it write it anyway or should I handle this on the reader side?
In proto3, presence tracking was not enabled historically, and the only presence rule was around zero defaults. Fortunately this changed recently in new versions of protoc. The optional
keyword can now be used in from of fields in proto3 to enable this. So: add optional
, and any compliant implementation should do what you want. The defaults are still zero/false/etc, but if they are explicitly set: they are serialized.
syntax = "proto3";
package com.example;
message Example {
optional uint64 ts = 1;
optional string tenantId = 2;
optional bool someFlag = 3;
optional int32 someInt = 4;
optional int32 otherInt = 2; // [sic]
}
Also, the second 2 should be a 5