Has anyone solved the error: message: compressions['SNAPPY'] = snappy.compress
AttributeError: module 'snappy' has no attribute 'compress'
when reading parquet in python? Btw, is there a way to read whole dir?
I am using python 3
through conda
on mac with snappy
and thrift
installed as per https://pypi.python.org/pypi/parquet
code as follows:
import parquet
import json
import fastparquet
with open(data_in_path + "file.parquet/part-01snappy.parquet", 'rb') as fo:
for row in parquet.DictReader(fo, columns=['id', 'title']):
print(json.dumps(row))
or
df2 = fastparquet.ParquetFile(path).to_pandas()
was not able to find snappy
solution, so I read data in spark
with snappy
and write it back with gzip
after each no issue in python are found:
df.coalesce(1).write.option("overwrite","true").option("compression","gzip").parquet(dfWithGzip.parquet")