I am doing a reading on AWS EMR on VPC but it seems like it is more of design consideration for AWS EMR Service to access EMR cluster for calls.
What I am trying to do is host a VPC with ALB and EC2 instance running an application as a service to access EMR cluster.
VPC -> Internet Gateway -> Load Balancer -> EC2 (Application endpoints) -> EMR Cluster
I don't want Cluster to be accessible from outside except through Public IP of IG. But Public IP can access only EC2 instance hosting application which calls EMR cluster on same VPC.
Is it recommended approach?
The design looks something like below. Some challenges I am tackling is how to access S3 from EMR if on VPC, and if the application is running on EC2 can it access EMR cluster, and if EMR cluster would be available publicly?
Any guidance links or recommendations would be welcome.
EDIT:
Or if I create EMR on VPC do i need to wrap it inside of another VPC something like below?
The simplest design is:
If you are security-paranoid, then you could use:
The first option is simpler and Security Groups act as firewalls that can fully protect the EMR cluster. You would create three security groups:
This will permit only the Load Balancer to communicate with the EC2 instances and only the EC2 instances to communicate with the EMR cluster. The EMR cluster will be able to connect directly to the Internet to access Amazon S3 due to default rules permitting Outbound access.