Search code examples
hadoopamazon-ec2elastic-map-reduce

How can I share jar libraries with amazon elastic mapreduce?


To speedup jar to s3 uploading I want to copy all my common jar to something like "$HADOOP_HOME/lib" in normal hadoop. Is it possible for me to create custom EMR hadoop instance with these libraries preinstalled. Or there are easier way?


Solution

  • You could do this as a bootstrap action. It's as simple as placing a script to do the copying into S3, and then if you're starting EMR from the command line, add a parameter like this:

    --bootstrap-action 's3://my-bucket/boostrap.sh'
    

    Or if you're doing it through the web interface, just enter the location in the appropriate field.