Search code examples
node.jsamazon-web-servicesaws-lambdaamazon-cognitoamazon-cognito-triggers

lambda trigger callback vs context.done


I was following the guide here for setting up a presignup trigger.

However, when I used callback(null, event) my lambda function would never actually return and I would end up getting an error

{ code: 'UnexpectedLambdaException', name: 'UnexpectedLambdaException', message: 'arn:aws:lambda:us-east-2:642684845958:function:proj-dev-confirm-1OP5DB3KK5WTA failed with error Socket timeout while invoking Lambda function.' }

I found a similar link here that says to use context.done().

After switching it works perfectly fine.

What's the difference?

exports.confirm = (event, context, callback) => {
    event.response.autoConfirmUser = true;
    context.done(null, event);
    //callback(null, event); does not work
}

Solution

  • Back in the original Lambda runtime environment for Node.js 0.10, Lambda provided helper functions in the context object: context.done(err, res) context.succeed(res) and context.fail(err).

    This was formerly documented, but has been removed.

    Using the Earlier Node.js Runtime v0.10.42 is an archived copy of a page that no longer exists in the Lambda documentation, that explains how these methods were used.

    When the Node.js 4.3 runtime for Lambda was launched, these remained for backwards compatibility (and remain available but undocumented), and callback(err, res) was introduced.

    Here's the nature of your problem, and why the two solutions you found actually seem to solve it.

    Context.succeed, context.done, and context.fail however, are more than just bookkeeping – they cause the request to return after the current task completes and freeze the process immediately, even if other tasks remain in the Node.js event loop. Generally that’s not what you want if those tasks represent incomplete callbacks.

    https://aws.amazon.com/blogs/compute/node-js-4-3-2-runtime-now-available-on-lambda/

    So with callback, Lambda functions now behave in a more paradigmatically correct way, but this is a problem if you intend for certain objects to remain on the event loop during the freeze that occurs between invocations -- unlike the old (deprecated) done fail succeed methods, using the callback doesn't suspend things immediately. Instead, it waits for the event loop to be empty.

    context.callbackWaitsForEmptyEventLoop -- default true -- was introduced so that you can set it to false for those cases where you want the Lambda function to return immediately after you call the callback, regardless of what's happening in the event loop. The default is true because false can mask bugs in your function and can cause very erratic/unexpected behavior if you fail to consider the implications of container reuse -- so you shouldn't set this to false unless and until you understand why it is needed.

    A common reason false is needed would be a database connection made by your function. If you create a database connection object in a global variable, it will have an open socket, and potentially other things like timers, sitting on the event loop. This prevents the callback from causing Lambda to return a response, until these operations are also finished or the invocation timeout timer fires.

    Identify why you need to set this to false, and if it's a valid reason, then it is correct to use it.

    Otherwise, your code may have a bug that you need to understand and fix, such as leaving requests in flight or other work unfinished, when calling the callback.

    So, how do we parse the Cognito error? At first, it seemed pretty unusual, but now it's clear that it is not.

    When executing a function, Lambda will throw an error that the tasked timed out after the configured number of seconds. You should find this to be what happens when you test your function in the Lambda console.

    Unfortunately, Cognito appears to have taken an internal design shortcut when invoking a Lambda function, and instead of waiting for Lambda to timeout the invocarion (which could tie up resources inside Cognito) or imposing its own explicit timer on the maximum duration Cognito will wait for a Lambda response, it's relying on a lower layer socket timer to constrain this wait... thus an "unexpected" error is thrown while invoking the timeout.

    Further complicating interpreting the error message, there are missing quotes in the error, where the lower layer exception is interpolated.

    To me, the problem would be much more clear if the error read like this:

    'arn:aws:lambda:...' failed with error 'Socket timeout' while invoking Lambda function
    

    This format would more clearly indicate that while Cognito was invoking the function, it threw an internal Socket timeout error (as opposed to Lambda encountering an unexpected internal error, which was my original -- and incorrect -- assumption).

    It's quite reasonable for Cognito to impose some kind of response time limit on the Lambda function, but I don't see this documented. I suspect a short timeout on your Lambda function itself (making it fail more promptly) would cause Cognito to throw a somewhat more useful error, but in my mind, Cognito should have been designed to include logic to make this an expected, defined error, rather than categorizing it as "unexpected."