Search code examples
python-3.xparquet

python 3 mac: snappy.compress AttributeError: module 'snappy' has no attribute 'compress'


Has anyone solved the error: message: compressions['SNAPPY'] = snappy.compress AttributeError: module 'snappy' has no attribute 'compress' when reading parquet in python? Btw, is there a way to read whole dir?

I am using python 3 through conda on mac with snappy and thrift installed as per https://pypi.python.org/pypi/parquet

code as follows:

import parquet
import json
import fastparquet

with open(data_in_path + "file.parquet/part-01snappy.parquet", 'rb') as fo:
for row in parquet.DictReader(fo, columns=['id', 'title']):
    print(json.dumps(row))

or

 df2 = fastparquet.ParquetFile(path).to_pandas()

Solution

  • was not able to find snappy solution, so I read data in spark with snappy and write it back with gzip after each no issue in python are found:

    df.coalesce(1).write.option("overwrite","true").option("compression","gzip").parquet(dfWithGzip.parquet")