When I create a streaming job with Amazon Elastic MapReduce (Amazon EMR), using the Ruby command line interface, how can I specify to use only EC2 spot instances (except for master)? The command below is working, but it "forces" me to use at lease 1 core instance...
./elastic-mapreduce --create --stream \
--name n2_3 \
--input s3://mr/neuron/2 \
--output s3://mr-out/neuron/2 \
--mapper s3://mr/map.rb \
--reducer s3://mr/noop_reduce.rb \
--instance-group master --instance-type m1.small --instance-count 1 \
--instance-group core --instance-type m1.small --instance-count 1 \
--instance-group task --instance-type m1.small --instance-count 18 --bid-price 0.028
Thanks
Both CORE and TASKS nodes run TaskTrackers but only CORE nodes run DataNodes so, yes, you need at least one CORE node.
So you could run spot core nodes?
./elastic-mapreduce --create --stream \
...
--instance-group master --instance-type m1.small --instance-count 1 \
--instance-group core --instance-type m1.small --instance-count 19 --bid-price 0.028
p.s. you also could run one CORE and many TASK nodes but, depending on how much reading/writing you're doing, you'll have pain since 18 nodes will be reading/writing to 1 node.
# expect problems....
./elastic-mapreduce --create --stream \
...
--instance-group master --instance-type m1.small --instance-count 1 \
--instance-group core --instance-type m1.small --instance-count 1 --bid-price 0.028
--instance-group task --instance-type m1.small --instance-count 18 --bid-price 0.028