Search code examples
pythonpython-polarszstdcompressed-files

How to read_csv a zstd-compressed file using python-polars


In contrast to pandas, polars doesn't natively support reading zstd compressed csv files.

How can I get polars to read a csv compressed file, for example using xopen?

I've tried this:

from xopen import xopen
import polars as pl

with xopen("data.csv.zst", "r") as f:
    d = pl.read_csv(f)

but this errors with:

pyo3_runtime.PanicException: Expecting to be able to downcast into bytes from read result.: 
   PyDowncastError

Solution

  • One needs to xopen the file in binary mode "rb", then it works:

    from xopen import xopen
    import polars as pl
    
    with xopen("data.csv.zst", "rb") as f:
        d = pl.read_csv(f)
    

    Beware that the entire file will be read into memory before parsing, even if you immediately use only a subset of columns/rows.