Search code examples
pythonamazon-s3aws-lambdaparquet

Write a json to a parquet object to put into S3 with Lambda Python


I would like to write a json object to S3 in parquet using Amazon Lambda (python)!

However I cannot connect fastparquet lib with boto3 in order to do it since the first lib has a method to writo into a file and boto3 expect an object to put into the S3 bucket

Any suggestion ?

fastparquet example

fastparque.write('test.parquet', df, compression='GZIP', file_scheme='hive')

Boto3 example

 client = authenticate_s3()
        response = client.put_object(Body=Body, Bucket=Bucket, Key=Key)

the Body would correspond to the parquet content! and it would allow to write into S3


Solution

  • You can write any dataframe to S3 by using the open_with argument of the write method (see fastparquet's doc)

    import s3fs
    from fastparquet import write
    
    s3 = s3fs.S3FileSystem()
    myopen = s3.open
    write(
        'bucket-name/filename.parq.gzip',
        frame,
        compression='GZIP',
        open_with=myopen
    )