Search code examples
jsonamazon-web-servicesamazon-s3parquetamazon-kinesis-firehose

Write parquet from AWS Kinesis firehose to AWS S3


I would like to ingest data into S3 from Kinesis Firehose formatted as parquet. So far I have just find a solution that implies creating an EMR, but I am looking for something cheaper and faster like store the received JSON as parquet directly from Firehose or use a Lambda function.

Thank you very much, Javi.


Solution

  • Good news, this feature was released today!

    Amazon Kinesis Data Firehose can convert the format of your input data from JSON to Apache Parquet or Apache ORC before storing the data in Amazon S3. Parquet and ORC are columnar data formats that save space and enable faster queries

    To enable, go to your Firehose stream and click Edit. You should see Record format conversion section as on screenshot below:

    enter image description here

    See the documentation for details: https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html