I would like to process the access-logs that Amazon CloudFront creates with Amazon Elastic MapReduce.
I just need some simple stats on how many times different files has been loaded from cloudfront so i thought i should just write a simple PIG-script for this.
The first problem i have is that cloudfront write the logs gzipped and as far as i know i can't read .gz in pig?
Any suggestions on how i should do this? I'm very new to elastic mapreduce so any hints on how to structure this kind of job is welcomed.
Sorry, this works by default. No need to unzip the logs before processing them. My bad.