I have an event stream which send millions of events through SNS every day. Through a lambda, these topics are then stored in s3 but each in its own file. Total size of these events is not much (less than 1 GB) but moving/deleting one day files each the size of a few bytes becomes a long process. Is there a way I can store these SNS topics into larger files (or even a single file)?
I'd have the Lambda write the events to Kinesis Data Firehose and use that to batch the events up to a certain size-threshold or time-window and then have Firehose deliver those to S3.
Here are some resources for that: