Can someone please clarify on what could be an use case for having an EBS volume, in an EMR cluster (transcient / on-demand cluster).
what are the benefits of using an EBS volume in an EMR? since the EBS will be deleted as well, with the termination of an EMR cluster.
Am planning to setup a EMR cluster to run a spark based ETL jobs, and looking for some inputs please. I can go with EMRFS/S3, but just wondering why do we have an EBS in EMR.
Thanks.
Some EC2 instance types supported by EMR do not have any storage other than supporting EBS (e.g., the c4 and m4 series). In this case, the instances will require EBS in order to be used with EMR, and a default volume of 10 GB will be attached to each instance unless you specify a larger volume.
Of course, EBS may also be used with other instances types that do already include storage if you require additional storage beyond what the instance provides.
For more information about EMR and EBS, see https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-storage.html