I'm baking an AMI from a database server which has 300GB root volume. 80% of the volume is in use. Reason behind baking the AMI is that we need multiple new instances with the exact same data everyday. AMI is the appropriate solution because the restoration process is extremely slow. So the data restoration process can't be initiated after creating the instances. We want instances to be ready in 7-8 minutes with all the data.
But, the performance in the new instances is extremely poor. The reason behind it is the instances use EBS and that needs to be initialized as described in this doc.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html
Unfortunately, the initialization process takes 5-6 hours and which is not a solution for us.
So, what is the best practice to bake an AMI when the underlying data needs to be in AMI is really big?
Now, I have something that helped a lot in initializing an EBS volume.
AWS recommends dd
or fio
for initializing EBS volumes. Running a single dd
process takes too much time. So, having multiple processes of dd
to pull a small chunk of data from given block makes the initialization process really quick.
nohup seq 0 $(($(cat /sys/block/xvda/size) / (1 << 10))) | xargs -n1 -P8 -I {} sudo dd if=/dev/xvda of=/dev/null skip={}k count=1 bs=512 > /dev/null 2>&1 &"