I have a bit over 1200 JSON-files in AWS S3 that I need to convert to Parquet and split into smaller files (I am preparing them for Redshift Spectrum). I have tried to create a Lambda-function that does this for me per file. But the function takes too long to complete or consumes to much memory and therefore ends before completion. The files are around 3-6 GB.
Btw. I use Python.
I do not want to fire up a EC2 for this, since that takes forever to complete.
I would like some advise on how to accomplish this.
AWS Glue is useful for this kind of task. You can create a glue job to convert json format day to parquet format and save it to a S3 bucket of your choice. https://aws.amazon.com/blogs/big-data/build-a-data-lake-foundation-with-aws-glue-and-amazon-s3/