Search code examples
pythonpandasparquetpyarrow

pandas.DataFrame.to_parquet fails when S3 is the destination


I'm having a Pandas dataframe which I'm trying to save as parquet file into S3:

dftest = pd.DataFrame({'field': [1,2,3]})
dftest.to_parquet("s3://bucket_name/test.parquet", engine='pyarrow', 
compression='gzip')

I'm getting : "FileNotFoundError: bucket_name/test.parquet"


Solution

  • Although I still couldn't make pandas.DataFrame.to_parquet approach to work with S3, I did find different solution which seems to work:

    import s3fs
    from fastparquet import write
    s3 = s3fs.S3FileSystem()
    myopen = s3.open
    write('s3://bucketname/test.parquet', dftest, compression='GZIP', open_with=myopen)