Search code examples
amazon-web-servicesmapreduceamazon-cloudfrontapache-pig

Processing CloudFront-logs with Elastic MapReduce (PIG)


I would like to process the access-logs that Amazon CloudFront creates with Amazon Elastic MapReduce.

I just need some simple stats on how many times different files has been loaded from cloudfront so i thought i should just write a simple PIG-script for this.

The first problem i have is that cloudfront write the logs gzipped and as far as i know i can't read .gz in pig?

Any suggestions on how i should do this? I'm very new to elastic mapreduce so any hints on how to structure this kind of job is welcomed.


Solution

  • Sorry, this works by default. No need to unzip the logs before processing them. My bad.