I have an EMR which is spinning up in eu-west-1
private subnet. I have defined a gateway endpoint for S3 in the route table. I have to access this public bucket/location exposed by AWS: s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
which is giving below error. I think this is because of cross-region access through gateway endpoint which is not allowed. I am able to access other buckets which are in the same region. Is there a workaround to access this, maybe through NAT? The route table already has a NAT but the request is somehow not going through that.
2019-04-10T05:17:06.849Z INFO Ensure step 1 jar file s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
INFO Failed to download: s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
java.lang.RuntimeException: Error whilst fetching 's3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar'
at aws157.instancecontroller.util.S3Wrapper.fetchS3HadoopFileToLocal(S3Wrapper.java:412)
at aws157.instancecontroller.util.S3Wrapper.fetchHadoopFileToLocal(S3Wrapper.java:351)
at aws157.instancecontroller.master.steprunner.HadoopJarStepRunner$Runner.<init>(HadoopJarStepRunner.java:243)
at aws157.instancecontroller.master.steprunner.HadoopJarStepRunner.createRunner(HadoopJarStepRunner.java:152)
at aws157.instancecontroller.master.steprunner.HadoopJarStepRunner.createRunner(HadoopJarStepRunner.java:146)
at aws157.instancecontroller.master.steprunner.StepExecutor.runStep(StepExecutor.java:136)
at aws157.instancecontroller.master.steprunner.StepExecutor.run(StepExecutor.java:70)
at aws157.instancecontroller.master.steprunner.StepExecutionManager.enqueueStep(StepExecutionManager.java:248)
at aws157.instancecontroller.master.steprunner.StepExecutionManager.doRun(StepExecutionManager.java:195)
at aws157.instancecontroller.master.steprunner.StepExecutionManager.access$000(StepExecutionManager.java:33)
at aws157.instancecontroller.master.steprunner.StepExecutionManager$1.run(StepExecutionManager.java:94)
Caused by: com.amazonaws.AmazonClientException: Unable to execute HTTP request: connect timed out
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:618)
at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1143)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1021)
at aws157.instancecontroller.util.S3Wrapper.copyS3ObjectToFile(S3Wrapper.java:303)
at aws157.instancecontroller.util.S3Wrapper.getFile(S3Wrapper.java:287)
at aws157.instancecontroller.util.S3Wrapper.fetchS3HadoopFileToLocal(S3Wrapper.java:399)
... 10 more
An S3 gateway endpoint will never try to route cross-region traffic, but a NAT Gateway should handle this traffic automatically. Given the assertion that a NAT Gateway is in place, then Unable to execute HTTP request: connect timed out
implies that the NAT Gateway (or a setting associated with it) is misconfigured.
As noted in comments, the specific issue here was that the NAT Gateway was provisioned on the same subnet it was intended to serve. This isn't a valid configuration, because in this case the NAT Gateway tries to reach the Internet... via itself... since it gets its default route from the subnet where it's deployed.
To create a NAT gateway, you must specify the public subnet in which the NAT gateway should reside.
...
After you've created a NAT gateway, you must update the route table associated with one or more of your private subnets to point Internet-bound traffic to the NAT gateway. This enables instances in your private subnets to communicate with the internet. (emphasis added)
https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html#nat-gateway-basics