I want to work with an EBS snapshot in an EMR job. Because the mapper reads from the snapshot, I want the snapshot mounted on every node. Is there an easy way to do that other than logging in to each node? I guess I could make the first step of my mapreduce job to mount it, but that seems wrong. Is there an easier way to do it?
It is possible, but you'll have to jump through some hoops to get it to work. Assuming you have recipe to create an EBS volume from the EBS snapshot in a shell script. EMR provides bootstrap actions, which are just shell scripts you can create and run. Bootstrap actions are run before any jobs (steps in EMR) are allowed to run.
Here are the steps you need to have your shell script perform:
To get the current instance id, use the metadata service:
wget -q -O - http://instance-data/latest/meta-data/instance-id
Once you have your shell script, you need to upload it to S3, and then add that script as a bootstrap action to your cluster: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-bootstrap.html
Also beware, you will be charged for each EBS volume you create, so ensure the delete on termination logic is setup properly!