Hey guys I am using the parquet_cpp's StreamWriter, but the output file is not empty. Even the header was not written, as the file was a 4-byte file.
std::shared_ptr<::arrow::io::FileOutputStream> outfile_{""};
std::string outputFilePath_ = "/tmp/part.0.parquet";
PARQUET_ASSIGN_OR_THROW(
outfile_,
::arrow::io::FileOutputStream::Open(outputFilePath_)
)
// build column names
parquet::schema::NodeVector columnNames_{};
columnNames_.push_back(
parquet::schema::PrimitiveNode::Make(
"Time", parquet::Repetition::REQUIRED, parquet::Type::INT64, parquet::ConvertedType::UINT_64
)
);
columnNames_.push_back(
parquet::schema::PrimitiveNode::Make(
"Value", parquet::Repetition::REQUIRED, parquet::Type::INT64, parquet::ConvertedType::UINT_64
)
);
auto schema = std::static_pointer_cast<parquet::schema::GroupNode>(
parquet::schema::GroupNode::Make("schema", parquet::Repetition::REQUIRED, columnNames_)
);
parquet::WriterProperties::Builder builder;
parquet::StreamWriter os_ = parquet::StreamWriter {parquet::ParquetFileWriter::Open(outfile_, schema, builder.build())};
// Start writing to os_, would be in a callback function
os_ << std::uint64_t{5} << std::uint64_t{59} << parquet::EndRow;
I seem to be missing something trivial for the column names and data to be written out, but I could not find anything online. Thank you.
Yeah. The RowGroup must be flushed too. So all I need is to have:
os_.EndRowGroup();
While the data is written out, the parquet file's footer is corrupted and could not be read. I posted a question HERE on this writing out footer issue.