Search code examples
hadoopamazon-web-servicesapache-zookeeperelastic-map-reduceemr

Can I access zookeeper from AWS Elastic Mapreduce job


I'm new to Hadoop, and running under AWS Elastic Mapreduce.

I need cluster-wide atomic counters in Hadoop and was suggested to use zookeeper for this.

I believe zookeeper is part of the Hadoop stack (right?), how would I access it from an Elastic Mapreduce job in order to set and update a cluster-wide counter?


Solution

  • I believe zookeeper is part of the Hadoop stack (right?)

    ZooKeeper (ZK) is not part of the Hadoop Stack. It's a Top Level Project (TLP) under Apache and is independent of Hadoop. So, first ZK has to be installed on EC2. Here are the instructions for the same.

    how would I access it from an Elastic Mapreduce job in order to set and update a cluster-wide counter?

    Once installed ZK can be used to generate a cluster wide counter using the ZK API. Here (1 and 2) discussions on the approach with the pros and cons. Here are some other alternatives for ZK for the same requirements.