Search code examples
amazon-web-servicesaws-lambdaamazon-dynamodbamazon-comprehend

Lambda trigger is not working as intended with bulk data


I'm using lambda triggers to detect an insertion into a DynamoDB table (Tweets). Once triggered, I want to take the message in the event, and get the sentiment for it using Comprehend. I then want to update a second DynamoDB table (SentimentAnalysis) where I ADD + 1 to a value depending on the sentiment.

This works fine if I manually insert a single item, but I want to be able to use the Twitter API to insert bulk data into my DynamoDB table and have every tweet analysed for its sentiment. The lambda function works fine if the count specified in the Twitter params is <= 5, but anything above causes an issue with the update in the SentimentAnalysis table, and instead the trigger keeps repeating itself with no sign of progress or stopping.

This is my lambda code:

let AWS = require("aws-sdk");

let comprehend = new AWS.Comprehend();

let documentClient = new AWS.DynamoDB.DocumentClient();

exports.handler = (event, context) => {

    event.Records.forEach(record => {

        if (record.eventName == "INSERT") {

            //console.log(JSON.stringify(record.dynamodb.NewImage.tweet.S));

            let params = {
                LanguageCode: "en",
                Text: JSON.stringify(record.dynamodb.NewImage.tweet.S)
            };



            comprehend.detectSentiment(params, (err, data) => {
                if (err) {
                    console.log("\nError with call to Comprehend:\n " + JSON.stringify(err));
                } else {
                    console.log("\nSuccessful call to Comprehend:\n " + data.Sentiment);


                    //when comprehend is successful, update the sentiment analysis data
                    //we can use the ADD expression to increment the value of a number
                    let sentimentParams = {
                        TableName: "SentimentAnalysis",
                        Key: {
                            city: record.dynamodb.NewImage.city.S,
                        },
                        UpdateExpression: "ADD " + data.Sentiment.toLowerCase() + " :pr",
                        ExpressionAttributeValues: {
                            ":pr": 1
                        }
                    };


                    documentClient.update(sentimentParams, (err, data) => {
                        if (err) {
                            console.error("Unable to read item " + JSON.stringify(sentimentParams.TableName));
                        } else {
                            console.log("Successful Update: " + JSON.stringify(data));
                        }
                    });


                }


            });

        }
    });
};

This is the image of a successful call, it works with the first few tweets

This is the unsuccessful call right after the first image. The request is always timed out


Solution

  • The timeout is why it’s happening repeatedly. If the lambda times out or otherwise errs it will cause the batch to be reprocessed. You need to handle this because the delivery is “at least once”. You also need to figure out the cause of the timeout. It might be as simple as smaller batches, or a more complex solution using step functions. You might just be able to increase the timeout on the lambda.