Search code examples
c++metadataparquet

How to write file-wide metadata into parquetfiles with apache parquet in C++


I use apache parquet to create Parquet tables with process information of a machine and I need to store file wide metadata (Machine ID and Machine Name).

It is stated that parquet files are capable of storing file wide metadata, however i couldn't find anything in the documentation about it.

There is another stackoverflow post that tells how it is done with pyarrow. As far as the post is telling, i need some kind of key value pair (maybe map<string, string>) and add it to the schema somehow.

I Found a class inside the parquet source code that is called parquet::FileMetaData that may be used for this purpose, however there is nothing in the docs about it.

Is it possible to store file-wide metadata with c++ ?

Currently i am using the stream_reader_writer example for writing parquet files


Solution

  • You can pass the file level metadata when calling parquet::ParquetFileWriter::Open, see the source code here