Search code examples
apache-sparkamazon-ec2ec2-amispark-ec2

Spark: How to increase drive size in slaves


How do I start a cluster with slaves that each have 100GB drive.

./spark-ec2 -k xx -i xx.pem -s 1 --hadoop-major-version=yarn --region=us-east-1 \
--zone=us-east-1b  --spark-version=1.6.1 \
--vpc-id=vpc-xx --subnet-id=subnet-xx --ami=ami-yyyyyy \
 launch cluster-test

I used an AMI that had the size of 100GB; yet, Spark resized it and started an 8GB drive. How do I increase that limit to 100GB?


Solution

  • This solves the issue, yet the question still seeks an answer to avoid having this issue in the first place.

    It turns out that the EBS volume is 100GB but the image on it is only 8 GB. That's why it's only seen as 8GB. To spread the image across the disk, this blog described in details how to do it. In addition, this SO answer is also helpful.