I'm using Parquet.Net (4.23.5) to write parquet file. I discovered that when I want to write the value null in a datacolumn the generated parquet file in unreadable.
So what do I do wrong
This is the simple code to test it:
var fields = new List<DataField>
{
new DataField<int>("id"),
new DataField<string?>("city")
};
var schema = new ParquetSchema(fields);
Parquet.Data.DataColumn[] columns = new Parquet.Data.DataColumn[2];
for (int i = 0; i < 2; i++)
{
Type t = fields[i].ClrType;
//var allData = getData(dataTable, i);
columns[i] = t switch
{
Type when typeof(string) == t => new Parquet.Data.DataColumn(fields[i], new string?[] { "London", null}),/*"Derby" */
Type when typeof(int) == t => new Parquet.Data.DataColumn(fields[i], new int[] { 1, 2 }),
_ => throw new NotImplementedException(),
};
}
using (Stream fileStream = System.IO.File.OpenWrite("c:\\test.parquet"))
{
ParquetOptions parquetOptions = new ParquetOptions { TreatByteArrayAsString = true, UseDictionaryEncoding = true, UseDeltaBinaryPackedEncoding = false };
using (ParquetWriter parquetWriter = await ParquetWriter.CreateAsync(schema, fileStream, parquetOptions))
{
parquetWriter.CompressionMethod = CompressionMethod.Gzip;
parquetWriter.CompressionLevel = System.IO.Compression.CompressionLevel.Optimal;
// create a new row group in the file
using (ParquetRowGroupWriter groupWriter = parquetWriter.CreateRowGroup())
{
foreach (var item in columns)
{
await groupWriter.WriteColumnAsync(item);
}
}
}
}
It creates the parquet file, but whe I try to read it with the ParQuetViewer , I cannot read the file
Your error is caused by this setting in your ParquetOptions: UseDeltaBinaryPackedEncoding = false
It seems the Parquet.NET library doesn't handle nullables correctly when delta binary encoding isn't used. I even tested with the latest version of the library: 5.0.2
.
If you can live with delta binary encoding, setting this flag to its default true
will resolve your error. But I would ultimately recommend opening a ticket in the project's repo to address the issue itself.
Testing locally, when the flag is true
I am able to open the parquet file without any issues: