compression option in fastparquet is not consistent

According to the project page of fastparquet, fastparquet support various compression methods

Optional (compression algorithms; gzip is always available):
snappy (aka python-snappy)
lzo
brotli
lz4
zstandard

especially zstandard is modern algorithm that provides high compression ratios as well as impressive fast compression/decompression speed. And this is what I want in fastparquet.

But in the doc of fastparquet.write

compression to apply to each column, e.g. GZIP or SNAPPY or a dict like {"col1": "SNAPPY", "col2": None} to specify per column compression types. In both cases, the compressor settings would be the underlying compressor defaults. To pass arguments to the underlying compressor, each dict entry should itself be a dictionary:
{
    col1: {
        "type": "LZ4",
        "args": {
            "compression_level": 6,
            "content_checksum": True
         }
    },
    col2: {
        "type": "SNAPPY",
        "args": None
    }
    "_default": {
        "type": "GZIP",
        "args": None
    }
}

Nothing mentioned about zstandard. What is worse, if I write

fastparquet.write('outfile.parq', df, compression='LZ4')

It pops up errors saying

Compression 'LZ4' not available. Options: ['GZIP', 'UNCOMPRESSED']

So fastparquest only support 'GZIP'? This is quite a discrepancy from the project page! Do I missing some packages? How to use fastparquest with all project page stated compression algorithm?

Solution

Yes, you may be missing some packages. Your system must have have the python LZ4 and/or zstandard bindings first. See the source code for more details.

For LZ4: if import lz4.block gives a ModuleNotFoundError, then go ahead and install with pip install lz4.
Similarly for zstandard: pip install zstandard
And for brotli: pip install brotlipy
And lzo: pip install python-lzo
And snappy: pip install python-snappy