node.js amazon-web-services amazon-dynamodb aws-sdk-js

What should be done when the provisioned throughput is exceeded?

I'm using AWS SDK for Javascript (Node.js) to read data from a DynamoDB table. The auto scaling feature does a great job during most of the time and the consumed Read Capacity Units (RCU) are really low most part of the day. However, there's a programmed job that is executed around midnight which consumes about 10x the provisioned RCU and since the auto scaling takes some time to adjust the capacity, there are a lot of throttled read requests. Furthermore, I suspect my requests are not being completed (though I can't find any exceptions in my error log).

In order to handle this situation, I've considered increasing the provisioned RCU using the AWS API (updateTable) but calculating the number of RCU my application needs may not be straightforward.

So my second guess was to retry failed requests and simply wait for auto scale increase the provisioned RCU. As pointed out by AWS docs and some Stack Overflow answers (particularlly about ProvisionedThroughputExceededException):

The AWS SDKs for Amazon DynamoDB automatically retry requests that receive this exception. So, your request is eventually successful, unless the request is too large or your retry queue is too large to finish.

I've read similar questions (this one, this one and this one) but I'm still confused: is this exception raised if the request is too large or the retry queue is too large to finish (therefore after the automatic retries) or actually before the retries?

Most important: is that the exception I should be expecting in my context? (so I can catch it and retry until auto scale increases the RCU?)

Solution

Yes.

Every time your application sends a request that exceeds your capacity you get ProvisionedThroughputExceededException message from Dynamo. However your SDK handles this for you and retries. The default Dynamo retry time starts at 50ms, the default number of retries is 10, and backoff is exponential by default.

This means you get retries at:

50ms
100ms
200ms
400ms
800ms
1.6s
3.2s
6.4s
12.8s
25.6s

If after the 10th retry your request has still not succeeded, the SDK passes the ProvisionedThroughputExceededException back to your application and you can handle it how you like.

You could handle it by increasing throughput provision but another option would be to change the default retry times when you create the Dynamo connection. For example

new AWS.DynamoDB({maxRetries: 13, retryDelayOptions: {base: 200}});

This would mean you retry 13 times, with an initial delay of 200ms. This would give your request a total of 819.2s to complete rather than 25.6s.