Search code examples
c#apache-arrow

Compress data with Zstd using C# Apache Arrow


How do I compress data with Zstd using C# Apache Arrow?

Looking through the source there doesn't appear to be code for compression, only decompression. So do I have to just compress the arrow file bytes? And is this readable by arrow in other languages, or do I have to decompress first and then feed to arrow?


Solution

  • How do I compress data with Zstd using C# Apache Arrow?

    Compression has not been added to the C# implementation. It is available in other implementations. I believe decompression was added first because it improves general compatibility.

    Compression support for C# is tracked here and you are welcome to provide an implementation.

    So do I have to just compress the arrow file bytes?

    The Arrow IPC format currently supports compression of individual buffers. Regrettably this doesn't seem to be documented in the prose spec but you can find more details in the code.

    And is this readable by arrow in other languages, or do I have to decompress first and then feed to arrow?

    You will need to update the metadata in the file to indicate that buffers are compressed. Then other implementations will be able to read the file without any extra configuration.

    If you want to compress the entire file and then decompress before you pass it on to another implementation then that should also work. However, it will then be your responsibility to do the compression / decompression.