Search code examples
amazon-cloudwatchamazon-cloudwatchlogs

Exporting Cloudwatch logs in original format


I am looking to find a way to export CW logs in their original form to s3. I used the console to export a days worth of logs from a log group, and it seems that a timestamp was prepended on each line, breaking the original JSON formatting. I was looking to import this into glue as a json file for a test transformation script. The original data used is formated as a normal json string when imported to cloudwatch and normally process the data it looks like:

{ "a": 123, "b": "456", "c": 789 }

After exporting and decompressing the data it looks like

2019-06-28T00:00:00.099Z { "a": 123, "b": "456", "c": 789 }

Which breaks reading the line as a json string since its no long a standard format.

The dataset is fairly large(100GB+) for this run, and will possibly grow larger in the future, so running the command a CLI command and processing each line locally isn't feasible in my opinion. Is there any known way to do what I am looking to do?

Thank you


Solution

  • TimeStamps are automatically added when you push the logs to the CloudWatch. All the log events present in the CloudWatch has timestamp.

    You can create a subscription filter to Kinesis Firehose and on Kinesis using lambda function you can formate the log events(remove the timestamp) then store the logs in the S3.

    https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Subscriptions.html