I am trying to run an AWS lambda application in JavaScript, but I can't make it work properly. I don't have any troubles with the JS configuration and triggering (I successfully runned a hello world app), but I'm experiencing problems with the aws-sdk library. To be honest, I don't know if this is a problem related to network configuration or to IAM configuration, but I'm pretty sure it's not a scripting issue, because I can run it without any problem locally in my computer. The main problem I have is that when the lambda app calls the AWS EMR API, there is a timeout error. It's like lambda is not able to communicate to EMR.
Here, you can see the emr client (console.log(emr_client)
):
emr: Service {
config:
Config {
credentials:
EnvironmentCredentials {
expired: false,
expireTime: null,
accessKeyId: 'XXXXXXXXXXXXXXXX',
sessionToken: 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
envPrefix: 'AWS' },
credentialProvider: CredentialProviderChain { providers: [Array] },
region: 'us-west-2',
logger: null,
apiVersions: {},
apiVersion: '2009-03-31',
endpoint: 'elasticmapreduce.us-west-2.amazonaws.com',
httpOptions: { timeout: 120000 },
maxRetries: undefined,
maxRedirects: 10,
paramValidation: true,
sslEnabled: true,
s3ForcePathStyle: false,
s3BucketEndpoint: false,
s3DisableBodySigning: true,
computeChecksums: true,
convertResponseTypes: true,
correctClockSkew: false,
customUserAgent: null,
dynamoDbCrc32: true,
systemClockOffset: 0,
signatureVersion: 'v4',
signatureCache: true,
retryDelayOptions: {},
useAccelerateEndpoint: false,
accesKeyId: 'XXXXXXXXXXXXXXXX' },
isGlobalEndpoint: false,
endpoint:
Endpoint {
protocol: 'https:',
host: 'elasticmapreduce.us-west-2.amazonaws.com',
port: 443,
hostname: 'elasticmapreduce.us-west-2.amazonaws.com',
pathname: '/',
path: '/',
href: 'https://elasticmapreduce.us-west-2.amazonaws.com/' },
_clientId: 1
}
Some AWS config information:
I created a VPC where my EMR cluster resides, located in us-west-2 region, and I'm triggering the lambda function there (as I can confirm consoling process.env.AWS_REGION
).
I setted up a subnet that was previously created inside this same VPC. The EMR cluster is inside it and the Lambda function has access to it.
I setted up a security group in this same VPC with all inbounds/outbounds allowed (all ports from and to 0.0.0.0/0) to see if I had a configuration problem there.
I setted up an execution role that has the following policies attached and linked it with my lambda function:
AWSLambdaFullAccess
AmazonElasticMapReduceFullAccess
AWSLambdaExecute
AWSLambdaVPCAccessExecutionRole
AWSLambdaRole
AWSLambdaENIManagementAccess
Finally, my code:
const AWS = require('aws-sdk');
exports.handler = (event, context, callback) => {
const emr = new AWS.EMR({
apiVersion:'2009-03-31',
region: process.env.AWS_REGION,
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY
});
const flowSteps = {
JobFlowId: process.env['JOB_FLOW_ID'],
Steps: [{
Name: "my_beautiful_step",
ActionOnFailure: "CANCEL_AND_WAIT",
HadoopJarStep: {
Jar: "command-runner.jar",
Args: [
"spark-submit",
"--master"," yarn",
...
...
...
]
}
}]
};
emr.addJobFlowSteps(flowSteps, (err, data) => {
if (err) {
console.log('ERROR', err, err.stack);
} else {
console.log('NO ERROR', data);
}
});
};
EDIT: I tried communicating to s3 (getting a bucket location) just to test if the problem was only with EMR, but the function also timouts.
Well, I solved my issue. Basically, you can't call AWS API endpoints inside a VPC if you don't have internet access, because most of the aws services have a public URL, e.g., https://elasticmapreduce.us-west-2.amazonaws.com
. You can clearly see this when you console the EMR client object (and this applies too for other client objects such as S3 as I verified)
Service {
config:
Config {
...
...
region: 'us-west-2',
logger: null,
apiVersions: {},
apiVersion: null,
endpoint: 'elasticmapreduce.us-west-2.amazonaws.com',
httpOptions: { timeout: 120000 },
maxRetries: undefined,
},
endpoint:
Endpoint {
protocol: 'https:',
host: 'elasticmapreduce.us-west-2.amazonaws.com',
port: 443,
hostname: 'elasticmapreduce.us-west-2.amazonaws.com',
pathname: '/',
path: '/',
href: 'https://elasticmapreduce.us-west-2.amazonaws.com/'
},
...
}
Anyways, AWS provides some local endpoints inside vpcs VPC Endpoints so you can access to those services endpoints inside the VPC without internet access. In another case, you have to set a NAT gateway + internet gateway (~u$s 30/month) to access to other services such as EMR.